internetarchive / bookreader

The Internet Archive BookReader
https://openlibrary.org/dev/docs/bookreader
GNU Affero General Public License v3.0
968 stars 411 forks source link

Gracefully handle jp2 images generated without -Clevels #45

Open tcj opened 8 years ago

tcj commented 8 years ago

Please pardon any mistakes I make in the reporting of this issue, but I clearly barely know what I am talking about.

We are getting errors in the syslog and nginx log like the following:

Jan  6 22:42:20 ___.us.a___.org php5-www-priv[4177]: Kakadu Core Error:
Jan  6 22:42:20 .org php5-www-priv[4177]: Attempting to access a non-existent resolution level within some
Jan  6 22:42:20 .org php5-www-priv[4177]: tile-component.  Problem almost certainly caused by trying to discard more
Jan  6 22:42:20 .org php5-www-priv[4177]: resolution levels than the number of DWT levels used to compress a
Jan  6 22:42:20 .org php5-www-priv[4177]: tile-component.
Jan  6 22:42:20 .org php5-www-priv[4177]: pnmtojpeg: EOF / read error reading magic number

and:

2015/12/29 16:29:22 [error] 30935#0: *23910069 FastCGI sent in stderr: "PHP Warning: BookReader Processing Error: unzip -p '/2/items/cu31924051987323/cu31924051987323_jp2.zip' 'cu31924051987323_jp2/cu31924051987323_0005.jp2' | /petabox/sw/bin/kdu_expand -no_seek -quiet -reduce 4 -rotate 0 -region {0.000000,0.000000},{1.000000,1.000000} -i /dev/stdin -o /tmp/stdout.bmp | (bmptopnm 2>/dev/null) | pnmtojpeg -quality 75 -- in /var/cache/petabox/petabox/www/datanode/BookReader/BookReaderImages.inc.php on line 453

notes from h___ at archive dot org:

"I.e., when making a jp2 from a bilevel source image (all pixels fully white or fully black), a certain option has to be specified while making the jp2 in order to be able to use the "-reduce" option when reading the jp2. That option has been specified for jp2s that we've made from bilevel sources since Feb 2011.

In the case of the error Tim reports, the jp2s were provided to us (created by K..., during a scanning project at C...). Those jp2s were evidently created without using that special option, so the images can't be extracted from them at a reduced resolution, as BookReader attempts to do ("kdu_expand ... -reduce 4 ... ").

I'm not sure what we can do, short of modifying BookReader (for which no one currently at the Archive knows the code) to check first whether an image file can be reduced, and if not, extract at full resolution and reduce as part of the pipeline of conversion commands (i.e., insert a call to "pnmscale" just before the call to "pnmtojpeg")."

tfmorris commented 8 years ago

That analysis sounds reasonable. The example above is a Microsoft scanned image donated by Cornell. https://archive.org/details/cu31924051987323

It doesn't make sense to me to have special case code for this in a non-image processing app. It shouldn't care whether the scaling is done via a fast path supported by the codec or after the fact on the decompressed image. I'd push the requirement back on opj_decompress to support a -scale qualifier to supplement -reduce. https://github.com/uclouvain/openjpeg/issues As an aside, I'm not sure why Internet Archive is using a closed source program (kdu_expand) here in the first place when OpenJPEG has an open source JPEG2000 decoder.