usnistgov / pyramidio

Image pyramid reader and writer
Other
33 stars 22 forks source link

[Q] Optimization for processing of huge images #2

Closed peterlitvak closed 8 years ago

peterlitvak commented 8 years ago

I was wondering if you would like to consider some changes for DZI generation for huge images that cannot fit in to memory. One idea I have is to replace the part of reader that loads the entire image in to memory with code that will just read region on demand (using ImageReader/ImageReadParam). This will slow down the conversion (not sure how much) but should enable huge images to be processed with this tool.

avandecreme commented 8 years ago

Yes, would be nice indeed. Maybe the best approach would be to implement a new PartialImageReader (named BigImageReader ?). The CLI would then choose between the existing BufferedImageReader and BigImageReader depending on the size of the input.

What do you think? Would you be willing to submit a patch for this?

peterlitvak commented 8 years ago

ok, will give it a try

peterlitvak commented 8 years ago

Well, it doesn't look like a viable option at least when implemented in a straightforward way, takes an unruly amount of time. For what takes 130s in memory on 8 cores this approach takes about 10-15 times more. Maybe a combination approach that would be acceptable is to process an image in stripes of a tile size high and image length long but that would require some redesign of the tile generation.

avandecreme commented 8 years ago

That is quite a huge slowdown! Could you still share your code? Even if it is slow, it could still be useful. Plus someone could spot a possible optimization.

Maybe a combination approach that would be acceptable is to process an image in stripes of a tile size high and image length long but that would require some redesign of the tile generation.

The current implementation tries to minimize the number of tiles it keeps in memory. It works well with the MistImageReader so I am not too willing to change it. I wouldn't mind having multiple implementations of the pyramid builder though.

Regarding your image. What size is it and how did you get it? If it is the result of some stitching, maybe a better approach would be to generate the pyramid directly from a stitching vector?

peterlitvak commented 8 years ago

The image size is 21600 x 21600. In our application we are mostly going to get already preprocessed images so I cannot count on having anything but those (possibly already stitched) images.

As far as the code goes, it is rather rudimentary, I can just post it here in the comment, not sure if it deserves a pull request but if you feel differently I can to a PR. Please let me know.

peterlitvak commented 8 years ago

Also there is a workaround workflow that seems to be acceptable for the huge files. If the file is converted to an uncompressed format (I tried BMP) and then the direct image reader is used then everything works reasonably fast. The trick is that one must use something like ImageMagick in order to convert the image first without engaging a lot of RAM and then use the PyramidIO with direct image reader to generate the tiles.

avandecreme commented 8 years ago

The trick is that one must use something like ImageMagick in order to convert the image first without engaging a lot of RAM

I just tried that, it looks like the convert command is using more RAM than the BufferedImageReader. Is it the case for you as well?

peterlitvak commented 8 years ago

hmm, that is odd, convert on my system takes much less memory

peterlitvak commented 8 years ago

I spoke to soon :( for some reason tiles produced via the method I describes are no good. For some reason the readRegion while reading direct from BMP produces tiles shifted by about 46 pixels, like so 0_1 Any ideas and/or recommendations are welcome.

avandecreme commented 8 years ago

Interesting. Does switching to the BufferedImageReader fix the problem? What about the DirectImageReader without a pool?

peterlitvak commented 8 years ago

Yes, if the image is read with BufferedImageReader it works with BMPs, also if the DirectImageReader used on JPG it also works correctly. Only the combination of DirectImageReader and BMP produces bad tiles. Something in the way read(index,params) works with BMP reader perhaps, or maybe the conversion with IM, but that is pretty straightforward, e.g.

convert ~/TMP/test_image_landscape_2584x1734.jpg -compress none +repage ~/TMP/test_image_landscape_2584x1734.bmp
avandecreme commented 8 years ago

Supposing your bmp looks good, it looks like there might be a bug with imageIO. What if you force the use of 1 processor with -p 1 (maybe there is some concurrency issue).

peterlitvak commented 8 years ago

This seems to be the BMP version and ImageIO BMP image reader issue, if I convert to BMP3 or BMP2 the shift gets smaller, which tells me that reader somehow doesn't take header size(?) in to the account, pretty confused at this point

avandecreme commented 8 years ago

Maybe you can get help on https://github.com/jai-imageio/jai-imageio-core

peterlitvak commented 8 years ago

It seems to be a known bug in open JDK, https://bugs.openjdk.java.net/browse/JDK-8041465 however I'm using sun's JDK will try to dig in to the source

peterlitvak commented 8 years ago

yes, a bug in JDK as well, fixed in b33 presumably http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8041465

avandecreme commented 8 years ago

It looks like the fix applied there: http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/a969b4f133cf Should be applied here too: https://github.com/jai-imageio/jai-imageio-core/blob/3f9c4019d0596d1589282a2ca9d5864fe2b8cfd2/src/main/java/com/github/jaiimageio/impl/plugins/bmp/BMPImageReader.java

I am quite surprised to see that code in the JDK actually. I thought ImageIO was not distributed by default. Maybe that changed with Java9?

peterlitvak commented 8 years ago

Yes, the fix was applied to JAI so com.github.jaiimageio.impl.plugins.bmp.BMPImageReader works as expected. Not sure about source distribution for JDK though.

avandecreme commented 8 years ago

Closing this issue since the original optimization has been done and the bmp issue is upstream.