Closed ghost closed 3 years ago
Thanks for all the details; I'll take a look at this today for sure.
I believe I have figured out the problem, but the solution may make things even slower than they already are for ImageMagick decoding.
It appears that the imagemagick disk cache (which seems to live in /tmp
) is only meant to be used for a single process per image, and once that cache has been used by an in-memory resource, it cannot be used by another resource. So either I force imagemagick to create a new cache even when operating on the same image (this could wreck disk space with heavy loads) or I clean up the image's cache after every read (which means not sharing info when reading things like image size vs. doing the decoding).
I'll keep looking at options here.
For my own info if I have to continue this work next week: PingImage
looks like a much better way to read image data from ImageMagick. Should solve the double-read efficiency problem described above.
Using PingImage and not reusing the ImageMagick struct dramatically reduces problems, but does not eliminate them. The cache stored in /tmp
doesn't appear to be thread-safe in any way. Two requests for large images, even when they're two different large images, will fail.
There must be a way to handle this from the ImageMagick APIs, so I'll keep digging.
To get a final fix, I'm going to force ImageMagick requests to be sequential instead of allowing them to be concurrent. This is really horrible, but at this point I've come to the conclusion that the problem is concurrency, not limits. I can verify that setting the disk limit to 10 bytes prevents any large image request from working, while setting it to infinity makes large images work just fine... so long as they're requested sequentially.
What kills me is that I'm certain it's got something to do with the internal way it tries to handle the temp files, because when you run two separate instances of the ImageMagick convert
program, they're fine. It's just a problem when two operations are trying to take place concurrently within the same process.
Unfortunately I won't have time to get a new docker release ready today. You can pull the latest changes from the develop
branch and build a docker image manually if necessary, otherwise it won't be until the middle of next week....
Thanks for all your work on this issue, I'll pay attention to the next release
FYI this should be fixed now. Take a look and let me know!
I tried the new build and it works, thanks. I'm working on a IIIF benchmark POC, so I quickly compared 4.0.0 and 4.1.0 versions and could indeed see a performance degradation with more concurrency.
Yes, that's not surprising, though it certainly is unfortunate.
For a small set of images, setting up an in-memory tile cache will make a tremendous difference, so long as you've got the RAM (see https://github.com/uoregon-libraries/rais-image-server/wiki/Caching). A filesystem-backed tile cache has been considered on and off, but has never made the cut. It would add a lot of complexity to ensure the filesystem doesn't fill up crazy-fast during traffic spikes, it's something of a niche problem, and there are dedicated external caches that do the job better than something added into RAIS.
If you are looking to benchmark, tiled, multi-resolution JP2 images are really what RAIS is built for. It'll never be amazing for images that have to be fully decoded just to serve a single tile, but it's pretty great for JP2s, thanks to the openjpeg libraries.
Lastest docker images fails serving large JPEG files after retrieving
info.json
. Versions prior to 4.0.1 are not affected.To reproduce I followed those steps:
info.json
for this image. This works once.Logs: