jokiazhang / metadata-extractor

Automatically exported from code.google.com/p/metadata-extractor
0 stars 0 forks source link

Bug on Marker preceeded by fill bytes #89

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,

I'm using tika to extract metadata from different file format.
Tika uses metadata-extractor (v 2.6.2) to manage Jpeg Files.

I had encountered an issue on some files with "marker segment" equal to 0xFF. 
The JpegSegmentReader throws an exception in this case but according to this 
document 'http://www.w3.org/Graphics/JPEG/itu-t81.pdf', it is a valid case. 
Here after the paragraph with the specification point.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>
B.1.1.2 Markers

Markers serve to identify the various structural parts of the compressed data 
formats. Most markers start marker segments
containing a related group of parameters; some markers stand alone. All markers 
are assigned two-byte codes: an X’FF’
byte followed by a byte which is not equal to 0 or X’FF’ (see Table B.1). 
Any marker may optionally be preceded by any
number of fill bytes, which are bytes assigned code X’FF’.
NOTE – Because of this special code-assignment structure, markers make it 
possible for a decoder to parse the compressed
data and locate its various parts without having to decode other segments of 
image data.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>

In attachment, you can find the modified class that fix this behaviour.

Best Regards,
Eric

Original issue reported on code.google.com by eric.gc....@gmail.com on 19 Dec 2013 at 8:41

Attachments:

GoogleCodeExporter commented 9 years ago
This has been fixed on GitHub and will be included in version 2.7.0 which will 
be released in three days.

I'm surprised this issue didn't rear its head more regularly as it's quite a 
significant bug relative to the spec!

Original comment by drewnoakes on 4 Dec 2014 at 11:43