openpreserve / jpylyzer

JP2 (JPEG 2000 Part 1) validator and properties extractor. Jpylyzer was specifically created to check that a JP2 file really conforms to the format's specifications. Additionally jpylyzer is able to extract technical characteristics.
http://jpylyzer.openpreservation.org/
Other
69 stars 28 forks source link

When bitmashing, jpylyzer sometimes breaks #31

Closed anjackson closed 10 years ago

anjackson commented 11 years ago

Some malformed (deliberately bit-flipped) JP2s appear to make jpylyzer hang. e.g. cc-16-kdu.jp2-killed-10918 runs for a long time and then stops, outputting simply 'Killed'.

anj@anj-VirtualBox:~/bitwiser$ hexdump -C cc-16-kdu-hangs/cc-16-kdu.jp2-killed-10918 > hd1
anj@anj-VirtualBox:~/bitwiser$ hexdump -C src/test/resources/cc-16-kdu.jp2 > hd2
anj@anj-VirtualBox:~/bitwiser$ diff hd*
131c131
< 00000830  00 6a 70 32 63 ff 4f ff  51 00 32 00 00 80 00 00  |.jp2c.O.Q.2.....|

---
> 00000830  00 6a 70 32 63 ff 4f ff  51 00 32 00 00 00 00 00  |.jp2c.O.Q.2.....|

More interestingly, cc-16-kdu.jp2-killed-22181 appears to cause an infinite loop:

....
User warning: ignoring unknown box
User warning: ignoring unknown box
User warning: ignoring unknown box
....

There may be further interesting modes of failure hidden in the other examples I have. I'll see about attaching them... Seems I cannot - so I uploaded the zip of fails here: https://dl.dropbox.com/u/135740/cc-16-kdu-hangs.zip

bitsgalore commented 11 years ago

Just had a first look at some of your files. The ones I checked out fall into either one of these 2 categories:

  1. Bit-flipping causes jpylyzer to expect an ICC profile that contans millions of entries. Examples are cc-16-kdu.jp2-killed-1971, cc-16-kdu.jp2-killed-1961, cc-16-kdu.jp2-killed-1937. Analyzing these files with ExifTool gives you sth like this:

    Bad ICC_Profile table (67108875 entries)

    The solution is to impose a sensible upper limit to the number of entries in an ICC profile.

  2. For all other images, the codestream header field with the number of tiles is corrupted so that it has a very large (milions!) value. Internally jpylyzer creates a dictionary which has an entry for each (expected) tile, and then loops over all them. The result is a seemingly endless loop + excessive memory usage. Examples are cc-16-kdu.jp2-killed-11218 and cc-16-kdu.jp2-killed-10918.

    Solution: impose sensible upper limit here as well. Already did a quick test, and this resulted in the secondary problem:

    TypeError: cannot serialize 2147483664L (type long)

    Solution: add long type to remap function in ETpatch, i.e.: elif textType in[int,long,float,bool]: textOut=str(remappedValue)

Then there's the endless loop in cc-16-kdu.jp2-killed-22181, will look at that later.

I'll fix this in the next release + I'll also run tests on the full set of files.

bitsgalore commented 10 years ago

Fixed in version 1.10.3. These were actually 3 separate issues:

anjackson commented 10 years ago

In case anyone wants to take this kind of tools testing/analysis further, here's a link to my write-up of this technique: Understanding Tools & Formats Via Bitwise Analysis