google / guetzli

Perceptual JPEG encoder
Apache License 2.0
12.9k stars 977 forks source link

Extremely slow performance #50

Open DanielBiegler opened 7 years ago

DanielBiegler commented 7 years ago

How long does it take for you to compress a couple of images?

I tried compressing a 7,8MB JPG with --quality 84 and it took nearly 20 minutes.

I also tried a 1,4MB JPG with --quality 85 and it took nearly 10 minutes.

I must assume that this is not normal - is something wrong with my binary?

I am on Ubuntu 16.04 LTS, intel core i7-4790K CPU @ 4.00GHz I installed gflag via sudo apt-get install libgflags-dev and got libpng via sudo apt-get install libpng16-dev. After that I make with no errors.

convert -quality 85 src.jpg dst.jpg runs in under 1 second, if that is any help.

Anyone else experience this?

robryk commented 7 years ago

It's true that Guetzli is really slow to compress. As a rough order of magnitude estimate, it takes ~ 1 minute / MPixel on a 2.6 GHz Xeon. The time consumed should increase a bit more quickly than linearly with image size.

How large in pixel count are your images?

iron-udjin commented 7 years ago

Are you going to implement multi-threaded image processing? It would be really helpful while guetzli doesn't have CPU/RAM usage optimizations for now.

jan-wassenberg commented 7 years ago

Would it be an option to invoke the binary multiple times in parallel? That's much easier than adding threading everywhere inside Guetzli+butteraugli, and should work unless you need to compress really large images.

iron-udjin commented 7 years ago

Yes, I use this option. But it will be great to have multi-threaded processing for large images in the future. That's why I'm asking about plans for implementation this feature.

DanielBiegler commented 7 years ago

@robryk the big one is around 16MP (5300x3000) so your estimate is roughly in that ballpark.

I was just testing guetzli and didn't dive into its specifics yet, will it be possible to multithread its workload?

For example I wanted to compress around 200 pictures each one is roughly 7-8mb in size. This would take forever. If I hyperthread with 8 threads and each picture takes roughly 20 minutes, the compression would take over 8 hours to complete.

robryk commented 7 years ago

@DanielBiegler If you have 200 pictures, I'd echo @jan-wassenberg's suggestion: run multiple instances of Guetzli and thus process multiple pictures in parallel. This will be more effective parallelization than anything that can be done inside Guetzli.

kornelski commented 7 years ago

@robryk I presume large part of the slowness and memory use is because it's the first release and Guetzli hasn't been optimized yet. How much of the slowness is inherent to the algorithm and unavoidable, and how much can be done to improve the speed?

robryk commented 7 years ago

@pornel We didn't try to optimize Guetzli in ways that could make it harder to modify. That means that there's likely both some speedup available by just optimizing single routines and, more significantly, speedup available by restructuring parts of Guetzli (e.g. attempting to reuse more computation results between iterations).

That said, I believe that much more can be done for memory consumption, which we didn't optimize nearly at all.

DanielBiegler commented 7 years ago

@robryk my estimate already took that into account.

Later, I'll compare the lower end around --quality 30 and if the results are good, maybe using guetzli in the background for a couple of days will be worth it(?) - we'll see.

Thanks for the quick answers.

slatted commented 7 years ago

I've been surprised overall with the performance. I have a 13MB image of Beef Slouvaki, and its been running for 25 min (using between 3 - 5gb of RAM), and it still isn't finished.

I was able to resize a 350kb~ image relatively quick (<2 min)

robryk commented 7 years ago

@slatted The runtime grows slightly faster than linearly with image size. If you want some inkling into what's happening, you might wish to pass --verbose commandline flag.

slatted commented 7 years ago

@robryk thanks, the 13MB image just finished up (right around 30min) and comes in at 2.9MB (@ 84 quality). Visually can't notice a difference. Very cool

clouless commented 7 years ago

is it normal, that a 4MB file needs more than 8GB of RAM inside Docker?

I have guetzli running inside docker and the docker process has 8GB of RAM. Guetzli process gets killed.

I also tried to give the Docker Daemon 12GB but guetzli still got killed.

you can test it with:

docker run -i -t codeclou/kartoffelstampf:1.1.1 bash
# docker container bash opens

wget -O hires.jpg https://codeclou.github.io/kartoffelstampf/test-images/test-affinity-photo-600dpi.jpg

guetzli hires.jpg hires_comp.jpg

Any hints of what I can do to lower RAM consumption? Is guetzli not suited to run dockerized?

dockerhub: https://hub.docker.com/r/codeclou/kartoffelstampf/

robryk commented 7 years ago

@clouless The image you're trying to compress has ~70MPix (its size is 8333x8333). According to the readme Guetzli uses ~300MB per MPix, so you should expect it to use ~21GB on that image.

This is obviously far from ideal (both the high constant and the inability to process the image in tiles), but this is within the current expectations. #11 is the issue for reducing memory consumption.

clouless commented 7 years ago

Ok thx for checking. Then my test image might be too large. I will try using the convert-to-png workaround with my actual DSLR photos for testing. Is there a way to check the amount of MPix before starting the conversion? How do you determine MPix? Is it safe to rely on exif Megapixels entry?

robryk commented 7 years ago

@clouless I use ImageMagick's identify or an image viewer to find the dimensions of the image and multiply them together to get the total number of pixels. I'd expect the exif entry to be usually correct when present.

Note that the convert-to-png workaround is a workaround for a problem where Guetzli errorneously claims that an image is invalid. It doesn't impact memory usage nor time spent in any appeciable fashion.

clouless commented 7 years ago

ok thx. You helped me a lot :) keep up the great work 👍

DanielBiegler commented 7 years ago

@clouless (img-width * img-height) / 1000000 = X megapixels

SuicSoft commented 7 years ago

The speed of Guetzli can be improved using OpenCL probably (on the GPU)

bitbank2 commented 7 years ago

I just profiled Gueztli and most of the time is spent in the butteraugli Convolution() and ButteraugliBlockDiff() methods. One of the big issues hurting the performance is the use of double-precision floating point values to calculate pixel errors. In this case, a 64-bit integer would provide the same accuracy for the error and increase the speed quite a bit since the original pixels could be left as-is. In certain cases, using doubles for pixels makes sense (e.g. some filter, scaling or transparency operations), but not for error calculations. The rest of the code has some efficiency problems, but won't affect the performance nearly as much.

erikng commented 7 years ago

When using the --verbose option, it would be great if an estimated time/memory consumption could be calculated and presented to the user. Perhaps calculating the megapixel count with the current estimated time/memory allocation.

clouless commented 7 years ago

@erikng that in combination with a --dry-run option would be great. So that it just tests how long it would take and how much memory it would cost. JSON formatted output would also be a huge plus.

graysky2 commented 7 years ago

Would it be an option to invoke the binary multiple times in parallel?

Functionally, you can do this with GNU parallel by invoking it like this:

parallel 'guetzli --quality 84 {} {.}.jpg' ::: *.png

Test it yourself:

wget https://github.com/google/guetzli/releases/download/v0/bees.png
for i in 1 2 3 4 5 6 7; do cp bees.png $i.png; done
time parallel 'guetzli --quality 84 {} {.}.jpg' ::: *.png
bdkjones commented 7 years ago

I'd love to implement this in my app, but the current performance figures are definitely a roadblock. Taking 13 minutes for a reasonably sized jpeg of a couple MB is simply too long to be practical in many applications.

From my perspective after reading all the current issues, the roadblocks to wide adoption are three and should be prioritized like this:

  1. Faster performance.
  2. Lower memory consumption.
  3. Failures on certain "non-standard" jpegs like those produced by certain cameras (you said you knew what the problem is here)

I think a good, rough goal would be to get to a point where a JPEG that's a couple MB in size takes no more than 10-12 seconds to optimize. That would make the algorithm practical in my use case, which is an app that optimizes hundreds of images at once as part of building websites.

jayniz commented 7 years ago

Another alternative to what @graysky2 said is https://github.com/fd0/machma

luiseps commented 7 years ago

Hi, I want create a multithreaded guetzli version, but a don´t understan the workflow. Anybody can explain how?..Or any document where I can find that.

jdluzen commented 7 years ago

I've attempted something like @bitbank2, and changed some of the doubles to floats in the 2 methods he mentioned, without a whole lot of net speedup, if any. I do want to attempt to convert to 64 bit ints and try again, but it seems like a more in depth undertaking than my current haphazard edits.

However, in order to try out some quick-and-dirty parallelization, I've also profiled it with a random JPG I have laying around. Adding OpenMP's #pragma omp parallel for to all the fors in butteraugli.cc:Mask seems to have improved the performance by ~10%. I first attempted to add them to the fors in the methods that @bitbank2 mentioned, but with mixed results: I got crashing, inconsistent iterations, different JPG file sizes, and high-CPU-with-no-progress (threads blocked? cache line misses?) I'll continue to poke at this as well.

In a related note, the memory usage from the version I've compiled vs the binary that is on the Releases page has drastically improved memory usage, sometimes saving up to 75%. guetzli omp guetzli stock

leafjungle commented 7 years ago

I print the time cost for each step, and the result: image size: 100K.

time cost: total cost: 86 seconds. ApplyGlobalQuantization: 20 seconds. SelectFrequencyMasking: 60 seconds.

banghn commented 6 years ago

Hi @leafjungle what is GloabalQuantization you applied. I also would like to improve compressing with guetzli

luiseps commented 6 years ago

@jdluzen I add #prama omp parallel for un butterugli.cc:convolution() but the procesa was more slow than original version. I believe that it was about the data race about the cache. Can hoy help me how to improve guetzli?. Thanks

rogierlommers commented 6 years ago

Not sure if related, but at our company, we chose to apply the Guetzli algorithm on all our rendered images. Because it's relative slow, we decided to distribute the load in a special way. You can read all about it here: https://techlab.bol.com/from-a-crazy-hackathon-idea-to-an-empty-queue/

g-i-o-r-g-i-o commented 5 years ago

Is it still so slow to be unusable?

peterbe commented 5 years ago

Is it still so slow to be unusable?

Yes, unless you're rich.

I used guetzli on https://songsear.ch to optimize all the little artist & album images. They're all pretty small images but it takes an age each time. What's worse, I'm running that on a beefy DigitalOcean droplet and the guetzli work is done synchronously at night. Twice it has happened that it maxed out the resources (even though it's single CPU) so much that it broke a bunch of other processes also running on that machine. It's hard to tell what exactly happened. Also, it was months ago. Perhaps this isn't a speed issue but a reliability issue. Either way, I can't afford to dedicate one CPU for 15s x the number images just to save a couple of single-digit percentages over mozjpeg.

DanielBiegler commented 5 years ago

@GianniGi and @peterbe you could check out the PR #227 which has CUDA/OpenCL support.

graysky2 commented 5 years ago

I found cjpeg works very well and much faster. Here is an example comparison to guetzli.

jonathas commented 5 years ago

I found cjpeg works very well and much faster. Here is an example comparison to guetzli.

I've tried the example in this article about MozJPEG using the same parameters, but guetzli still generated a much smaller file

bitbank2 commented 3 years ago

Waking up this old issue... has someone ever given a good reason why this code can't make use of SIMD? Since most of the time is spent in the Convolution function, it can be sped up with a couple of new lines of code and an #ifdef. This is the simplest possible SIMD change to the code and it results in a significant speed up. Starting at line 219 in butteraugli.cc:

Screen Shot 2020-10-04 at 1 35 04 PM

On my MacBook Pro the time to compress a small test image went from 37.5 to 30.6 seconds after the above change.

graysky2 commented 3 years ago

https://github.com/mozilla/mozjpeg

bitbank2 commented 3 years ago

@graysky2 does this also mean that Google has abandoned using butteraugli?

jyrkialakuijala commented 3 years ago

Guetzli was a proof-of-concept milestone for us in creating new solutions for JPEG XL.

I'm considering of creating "Guetzli 2.0" that runs only one iteration of butteraugli and using the butteraugli from https://gitlab.com/wg1/jpeg-xl/-/tree/master/jxl/butteraugli and initialization code from https://gitlab.com/wg1/jpeg-xl/-/tree/master/jxl/enc_adaptive_quantization.cc

I suspect that would make Guetzli around 100x faster.

maxwell-Photo-Editor commented 3 years ago

I have a clipping path editing office. You need any image editing and you want to learn image editing please visit our website. Please send us some free trial image for test my team and we will submit your done files as soon as possible. Our experienced professional photo editing team waiting for your test jobs.

doterax commented 3 years ago

Hi folks, if someone is still needed guetzli windows binaries with CUDA support, please check this out . This results 25-40 times faster recompression.

doterax commented 1 year ago

Hi folks, if someone is still needed guetzli windows binaries with CUDA support, please check this out . This results 25-40 times faster recompression.

Update, new releases is here https://github.com/doterax/guetzli-cuda-opencl/releases/