mrtd_fileread_write_image_to_file not robust enough

justuswilhelm commented 7 years ago

mrtdreader will fail with the Couldn't find start of image error, at least for my German passport. After hotwiring fileread.c to always output the image, I found out that a JPEG2000 image is stored with offset 89, so that running

> tail -c +89 image.jpg > image_.jpg

will yield a correct file:

> file image_.jpg
image_.jpg: JPEG 2000 Part 1 (JP2)

I have no idea how other passports store their data, but perhaps running a simple linear search for 0000 000C 6A50 2020 0D0A 870A and the other JPEG headers will already suffice.

rubund commented 7 years ago

Hi,

Are you using the latest mrtdreader version from GIT ?

There is a linear search for 0x00,0x00,0x00,0x0c,0x6a,0x50,0x20,0x20,0x0d,0x0a already. The question is why does it not work for you. Does it never reach that sequence in your passport, or do you have one such sequence earlier in the file. Do you think you can find out?

Best regards, Ruben

justuswilhelm commented 7 years ago

I will let you know once I get the chance to fiddle with this again -- in January I will prepare a small presentation on ePassports.

justuswilhelm commented 7 years ago

Hi, this is the output of the tool using my German passport:

NFC device: NXP / PN533 opened
Target found!
======================
Challenge successful!
======================

Getting EF.COM... done
File content: XXX
File size: 24
Found: EF_DG1
Found: EF_DG3
Found: EF_DG14
Found: EF_DG2

Getting EF.SOD... done
File content: XXX
File size: 1571

Getting EF.DG1... done
XXX

File size: 93

Getting EF.DG2 which contains the image... done
Couldn't find start of image

zhouer commented 4 years ago

@rubund Hi, same issue here. After digging into this issue, I found the

filetype = file_content[73];  // 0x00: JPG, 0x01: JPEG2000

is 0x01 (JPEG2000), but the image content is JPEG in my case. So I suggest ignore the file_content[73], but check both JPEG and JPEG2000 start sequences.

evaxige commented 1 year ago

Hi, this is JPEG2000 start sequence code details, from OpenCV issue（https://github.com/opencv/opencv/issues/19083）.

The JPEG2000 standard defines both a "codestream" format and a file format for JP2K data. A codestream should ideally always be wrapped in the file format container, but in practice raw codestreams are sometimes found in the wild.

The OpenCV integration of OpenJPEG can read the JP2K file format, but not raw codestreams (also checked today's master). OpenJPEG itself has no problem reading codestreams (tested with the opj_decompress tool).

A codestream can be identified by its 4-byte header FF 4F FF 51.

So, add start sequence code 0xFF, 0x4F, 0xFF, 0x51 for file format "*.j2c".

unsigned char start_sequence_jpeg[10] = {0xff,0xd8,0xff,0xe0,0x00,0x10,0x4a,0x46,0x49,0x46}; // *.jpeg/jpg
unsigned char start_sequence_jpeg2000[10] = {0x00,0x00,0x00,0x0c,0x6a,0x50,0x20,0x20,0x0d,0x0a}; // *.jp2
unsigned char start_sequence_jpeg2000_codesteam[4] = {0xff, 0x4f, 0xff, 0x51}; // *.j2c

If parsing a Chinese passport, jpeg2000_codesteam can be parsed preferentially for performance reasons.

rubund / mrtdreader

mrtd_fileread_write_image_to_file not robust enough #2