lovell / sharp

High performance Node.js image processing, the fastest module to resize JPEG, PNG, WebP, AVIF and TIFF images. Uses the libvips library.
https://sharp.pixelplumbing.com
Apache License 2.0
29.17k stars 1.29k forks source link

Trying to understand sharp memory usage #349

Closed janaz closed 8 years ago

janaz commented 8 years ago

I noticed that my application which uses sharp is occasionally crashing. After doing some investigation I noticed that it allocates huge amount of memory to process progressive JPEG images and if there's no memory available for node, it just exits with code 137.

I understand that in order to process a progressive JPEG, it needs to be fully uncompressed into memory. What I don't understand is why the memory usage is increasing as more images are being processed. I created a proof of concept project that shows the problem.

In my proof of concept script there's a for-like loop that loads a 100megapixel progressive jpeg image from file to a buffer. Then a new sharp object is created and initialized with the buffer. The image gets resized and the loop moves to the next iteration. On my c4.large instance the script is only able to go through 5 iterations before it runs out of memory (3.75GB is available on that EC2 instance)

The 100megapixel image needs 300MB of memory. I noticed that for each iteration the memory usage grows by around 600MB. It gets stable at about 3.5GB - from that point it doesn't grow any further. I'm assuming that at this point the node/v8 starts garbage collection.

Is this behaviour expected? Is there anything I can do (i.e. tweak the gc settings) so that my app doesn't need to allocate up to 3.5GB memory.

lovell commented 8 years ago

Hi Tomasz, if you've not already seen them, there are some possible tips about libvips' cache and V8 settings that might help you in https://github.com/lovell/sharp/issues/260#issuecomment-136601122 and https://github.com/lovell/sharp/issues/260#issuecomment-136714113

jcupitt commented 8 years ago

Progressive JPEG images are difficult to handle well. libvips exposes a thing to tell if you have a progressive image: jpeg-multiscan. You'd use it something like:

VipsImage *image = vips_image_new_from_file ("progressive-jpeg-image.jpg", NULL);
int jpeg_multiscan;
if (vips_image_get_int (image, "jpeg-multiscan", &jpeg_multiscan) &&
  jpeg_ multiscan) {
  // progressive jpg ... when we try to process it, 
  // we will see a huge memory allocation and everything will lock up
  // ... turn on some special handling, like reading via a temporary disc file
}
lovell commented 8 years ago

@janaz These files might be held open in libvips' cache, preventing memory being returned to the OS.

Are you able to try the improved control over libvips' cache available on the master branch? If so, use sharp.cache(false) and see if memory usage improves.

janaz commented 8 years ago

Thanks @lovell, I'll check the latest commit and post the results.

Yesterday I started logging the rss, and heap usage. The heap stayed at the same level, but the rss was growing by 600mb with every iteration of my test script. It must be the memory allocated and managed by vips.

janaz commented 8 years ago

I tested with sharp.cache(0,0) using 0.12.2 version and also sharp.cache({memory:0, files:0, items:0}) using the master branch.

In both cases the memory used by my script dropped by an order of magnitude. I also didn't notice any performance degradation. This raises another question: In what scenarios using cache can help significantly with performance?

lovell commented 8 years ago

Thanks for confirming @janaz . The cache is useful when processing the same image a few times in close succession, e.g. converting one image into a selection of other fixed dimensions. sharp uses slightly lower defaults than libvips, but I guess they could be a bit lower still.

Perhaps the progressive JPEG loader should never be cached? @jcupitt I've seen that operations have a nocache flag. Can this be dynamically controlled?

jcupitt commented 8 years ago

I agree, never caching progressive JPEGs is probably the best solution. It's very simple and doesn't disturb the API.

It'll be a huge performance hit for anyone who tries to open the same progressive JPEG repeatedly, but p-jpegs are rare, so I doubt if it'll ever be a problem.

I've opened an issue on the libvips tracker for this.

jcupitt commented 8 years ago

This https://github.com/jcupitt/libvips/commit/6f94cb5ed46a66cb8e26ae36a3ca281d86df7ce6 seems to stop progressive jpg images being cached. It's in git master libvips, if anyone could test it.

janaz commented 8 years ago

Thanks @jcupitt for addressing this issue in libvips so quickly. I've done some memory usage tests with my updated test script. I don't think the recent change in libvips helped with reducing sharp memory usage.

I used libvips from master branch and sharp 0.13.0.

My script is resizing a 100MPixel progressive jpeg image in a loop. I measure rss value after every iteration. I rerun the script for different values of the memory cache limit. As you see in the table below, increasing the cache from 6 to 7MB caused the memory usage to go up by the factor of 10. This is where I don't quite understand the libvips cache behaviour. Such a small change of the cache limit but as a result - huge difference in the memory usage.

The results (values are in MB):

iteration cache 0MB 5MB 6MB 7MB 10MB 15MB 20MB
1 38.51 38.46 38.66 38.65 38.46 38.32 38.67
2 47.17 45.05 47.18 621.96 623.71 621.67 623.98
3 58.62 56.29 58.75 629.49 631.31 1205.87 1208.00
4 66.95 64.41 66.18 635.39 638.51 1213.28 1792.15
5 74.64 72.36 74.12 643.42 648.55 1220.46 1836.66
cache.memory.high before exit 6 6 6 11 11 15 24
lovell commented 8 years ago

@janaz libvips' cache limit is for memory it allocates, which doesn't always include memory allocated by libraries upon which it is dependent.

Your very detailed chart (thank you) suggests that at 7MB, libvips has enough space to be able to cache an operation that results in libjpeg keeping the file open. I'm sure John can provide more details here.

jcupitt commented 8 years ago

I made a tiny test program to show the effect of the nocache thing.

/* compile with
 *
 *      gcc -g -Wall try208.c `pkg-config vips --cflags --libs`
 */

#include <vips/vips.h>

int
main( int argc, char **argv )
{       
        int i;

        if( VIPS_INIT( argv[0] ) )
                vips_error_exit( NULL ); 

        for( i = 0; i < 10000; i++ ) {
                VipsImage *im;
                double d;

                if( !(im = vips_image_new_from_file( argv[1], NULL )) )
                        vips_error_exit( NULL );
                if( vips_avg( im, &d, NULL ) )
                        vips_error_exit( NULL );
                printf( "%d - average of %s is %g\n", i, argv[1], d );
                g_object_unref( im );
        }       

        return( 0 );
}

It just loads an image and finds the average. If you run it on a regular jpeg file, the load and the find-average operations are cached and then reused, so it runs through all 10,000 iterations very quickly.

$ time ./a.out ~/pics/k2.jpg 
1 - average of /home/john/pics/k2.jpg is 102.792
....
9999 - average of /home/john/pics/k2.jpg is 102.792
real    0m1.084s
user    0m0.936s
sys 0m0.140s

If you run it on a progressive jpg, it has to repeatedly reload the image:

$ time ./a.out ~/pics/horse1600x1200-002.jpg 
0 - average of /home/john/pics/horse1600x1200-002.jpg is 90.2484
...
390 - average of /home/john/pics/horse1600x1200-002.jpg is 90.2484
^C
real    0m22.877s
user    0m30.612s
sys 0m1.116s

Memuse in both cases is around 13MB.

You have a 100MP image, which will need 300MB of RAM to decode, thanks to progressive jpg. I'm not sure where the other 290MB is going, that's a bit mysterious. As Lovell says, I guess that above 6MB vips feels it's able to keep enough operations around that the progressive jpg decode buffer is staying alive.

The "nocache" flag is not transitive. Suppose you have a chain of operations like:

A -> B -> C

"A" could be tagged as nocache (and won't get cached), but "B" and "C" will be cached (unless they are tagged as nocache themselves). If the cache holds a ref to "B" and "B" holds a ref to "A", "A" will be kept alive, though it can never be referenced again.

I'm not sure what to do about this :-( I'll have a think.

jcupitt commented 8 years ago

I reread the jpeg decoder and libjpeg docs, vips is using the default transparent mode to handle progressive jpg images. In this mode, for your image libjpeg allocates a huge 300MB buffer, deocdes the image to that, then hands it slowly over to vips as a set of scanlines. vips allocates another 300mb buffer to hold the scanlines, copies to that, then serves pixels from there to the rest of vips.

This is clearly dumb. It seems vips should use libjpeg in "buffered image mode" for progressive jpg decoding. This should save 300MB on read, plus a bit of time.

I now don't think nocache is a good fix. It would be better if the jpeg loader watched out for the "minimise" signal that's sent around at the end of evaluation and threw away its load buffer. It can (slowly) reallocate and recreate this is it ever has to.

I've made a new issue + branch to implement this:

https://github.com/jcupitt/libvips/issues/388

jcupitt commented 8 years ago

On reading more of libjpeg.txt and doing some experimenting, it seems libjpg always allocates a huge decode buffer for progressive images, you can't avoid it. I don't think there's going to be a saving there.

I looked into "minimise", but I'm not sure there's a saving there either. And nocache will not work well in non-trivial cases.

I have made git master free the decode state earlier, so you should now see about 300MB per 100mp image rather than 600, so that's something.

Large progressive jpg images are difficult to support well :(

lovell commented 8 years ago

Thanks for investigating this John. 50% less memory use is pretty good!

On a related note, but probably not related enough to solve this problem, I notice the forthcoming v1.5 version of libjpeg-turbo will introduce new jpeg_crop_scanline and jpeg_skip_scanlines methods to its API. It still decodes in 8x8 blocks, but the ability to request multiple-of-eight rectangular blocks does more closely resemble libvips' use of regions.

jcupitt commented 8 years ago

Yes, I saw the partial decompression stuff. Sadly I don't think vips will be able to make use of this, except by perhaps adding a crop= option to the jpeg loader.

lovell commented 8 years ago

@janaz Do you have all the information you need?

janaz commented 8 years ago

Yes. Thanks @lovell and @jcupitt for your answers.

papandreou commented 8 years ago

Followed up here: https://github.com/lovell/sharp/issues/429