lovell / sharp

High performance Node.js image processing, the fastest module to resize JPEG, PNG, WebP, AVIF and TIFF images. Uses the libvips library.
https://sharp.pixelplumbing.com
Apache License 2.0
28.49k stars 1.28k forks source link

Switch to improved vips_resize, requires libvips v8.3.1+ #310

Closed jcupitt closed 8 years ago

jcupitt commented 8 years ago

Hi all, I've been looking at vips_resize(), the vips function that sharp uses for downsizing, and I think I've maybe found a couple of things which can improve quality, especially for small downsizes. Would anyone be able to help test the changes?

Code changes in this commit:

https://github.com/jcupitt/libvips/commit/8506ff13a674372d28b62b4a5e5f3e22e516fc01

There's a test image here, plus sample shrinks from vipsthumbnail and convert. Put the images in separate tabs and try flipping between them.

http://www.rollthepotato.net/~john/skin/

The test image is an interesting one since it combines fine, high-contrast detail in the hair, plus a strong skin texture (I hope the subject doesn't mind me saying that). The downsize needs to minimise aliasing in the hair while retaining texture in the skin.

The old vips_resize() avoided anti-aliasing nicely, but had the effect of almost airbrushing the skin. The three changes in the patch:

Any thoughts? Would someone be able to compare to photoshop? I remember we had an interesting discussion the last time we went over this code.

lovell commented 8 years ago

Hi John, sharp doesn't currently use vips_resize :scream: although I'd very much like it to (e.g. #299).

One thing that prevents sharp going all-in is the need to support resizing without maintaining aspect ratio. (which _shrink and _affine both support). This could be as simple as something like sharp's approach of averaging the x and y factors - see https://github.com/lovell/sharp/blob/master/src/pipeline.cc#L536

jcupitt commented 8 years ago

You could check your resize code against these changes, I guess. I think you are using a pretty similar algorithm, is that right?

jcupitt commented 8 years ago

I'd forgotten about that request, sorry. I added non-square resizes to vips_resize():

https://github.com/jcupitt/libvips/commit/2620f3b8c990f0dfc8658c7f3ba7bb4bdf31e326

It'll be poorer quality for non-square, but perhaps that doesn't matter.

jcupitt commented 8 years ago

I had a look through pipeline.cc and it looks like these changes aren't going to be relevant to sharp, you've already turned down the anti-alias filter. I'll close.

lovell commented 8 years ago

I feel this is still relevant to sharp. Your changes to vips_resize make it behave much more like the current (guesstimated) level of anti-aliasing.

If it's OK with you I'd like to reopen and make this the task to migrate sharp to using vips_resize.

jcupitt commented 8 years ago

Sure, reopen away. It's at least a useful test image.

Here's the previous resize quality issue: https://github.com/lovell/sharp/issues/121

It had these two other useful test images: wet hair, featuring a lot of high-contrast edges and some skin texture, though less than the one above, plus ghibli, which has tricky gently-angled lines.

http://www.rollthepotato.net/~john/wet-hair/ http://www.rollthepotato.net/~john/ghibli/

I've added new images for these new settings. wet-hair looks about the same, but the ghibli is a bit better.

lovell commented 8 years ago

Thanks John.

On a related note, I've been experimenting with the removal of the anti-alias filter for the linear(-ish) interpolators - see https://github.com/lovell/sharp/blob/look/src/pipeline.cc#L549 - sample outputs at https://github.com/lovell/sharp/tree/look/test/interpolators

This is due to the proposed switch, in the next release of sharp, to a default interpolator setting of bicubic (from bilinear). Avoiding the convolution step with bilinear resizing is ~20% faster and still produces fairly good results (esp. if we assume that using bilinear is itself "good enough").

jcupitt commented 8 years ago

Yes, vipsthumbnail has been bicubic for a while. I think you still need the anti-alias filter and the sharpen, even with bicubic.

puzrin commented 8 years ago

A small note, if it helps someone.

  1. Word thumbnail in resize functions is often used to say, that method is optimized for minimal output size. Than can include ICC profile shrinking and is good only for small images.
  2. It's better to split downsizing to 2 clear steps
    • resize (interpolation filter can be selected according to quality and speed requirements)
    • sharpening (usually "unsharp mask" is used, but more advanced metods are also possible)
jcupitt commented 8 years ago

Hi @puzrin, I agree on 1.

On 2., some degree of sharpening can be part of the interpolation algorithm. vips_resize() does box filter + anti-alias filter + bicubic + sharpening, and the final (rather mild) sharpening step is really part of the interpolation: it produces the negative lobes of the lanczos that's being approximated.

I think I'd agree that any extra sharpening should be possible but optional. vips has quite a nice cored sharpen which could be useful:

http://www.vips.ecs.soton.ac.uk/supported/current/doc/html/libvips/libvips-convolution.html#vips-sharpen

I think sharp has used this before.

puzrin commented 8 years ago

@jcupitt i've rereaded Nicolas's articles & IM forum topics, also digged sources a bit. Seems i was not right. Pre-plur and post-USM technics are both correct.

In this case Intel's article can be interesting for you for fast blur computation.

jcupitt commented 8 years ago

That's an interesting article @puzrin.

vips already does something like what Intel do. vips generates a separable convolution matrix from the sigma of the gaussian, then (at run time) writes a short AVX program which implements exactly that convolution on exactly that image, baking in things like pixel stride and scale. Intel's thing would probably win for large radius blurs, but the vips one should be quicker for small radius.

I'm exaggerating a bit :-( vips uses orc for the run-time code generation, and memory management issues inside orc have been the cause of a lot of crashes for us. I'm not sure if sharp even has this turned on any more :-( There's a new orc with at least some of this fixed, things might be getting better.

puzrin commented 8 years ago

@jcupitt i meaned, vips use classic convolver, with complecity depending on blur radius (sigma + precision). Inte's article use IIR filter with 4 simple passes total (forward/back for horizontal/vertical).

PS. For fun: here you can play with javascript demo of IIR blur, and here you can play with unsharp mask.

jcupitt commented 8 years ago

I understand, but the vips one will still be quicker for small radius. The anti-alias blur radius is very small, only three pixels across, so vips can do it in only a couple of instructions.

puzrin commented 8 years ago

@jcupitt if we speak about quality, could you explain why vips does not implements Lanczos (3 lobes) filter? Speed reasons? Any suggested replacements? AFAIK lanczos3 is the most popular default.

jcupitt commented 8 years ago

Yes, vips tries to implement a fast approximation of lanczos2. This was the IM default, have they changed it? I should check.

puzrin commented 8 years ago

In ubuntu 14.04lts with ImageMagick 6.7.7-10:

convert -size 100x100 xc:  -set option:filter:verbose 1 -resize 50% null: | grep '^#'
# Resize Filter (for graphing)
#
# filter = SincFast
# window = SincFast
# support = 3
# window-support = 3
# scale-blur = 1
# practical-support = 3

That's lanczos3

lovell commented 8 years ago

"I'm not sure if sharp even has this turned on any more :-( There's a new orc with at least some of this fixed, things might be getting better. "

The pre-compiled version of libvips v8.1.1 used by the latest v0.12.0 of sharp for 64-bit Linux systems includes liborc 0.4.24. No complaints yet :)

puzrin commented 8 years ago

No complaints yet :)

0.12.0 segfaults :(, new liborc did not helped.

puzrin commented 8 years ago

Finally migrated to sharp from gm and got some time to look at samples from the first post:

Side-notes:

jcupitt commented 8 years ago

I just merged a slightly faster cubic shrinker to master. The old code used affine plus a bicubic interpolator, the new code uses separate vertical then horizontal 1D cubic shrinkers.

You can see a speedup in some cases, for example, with a 10,000 x 10,000 pixel image being reduced to 7,000 x 7,000, we had:

$ time vipsthumbnail wtc.tif -o x.tif -s 7000
real    0m2.325s
user    0m8.316s
sys     0m0.256s

And it's now:

$ time vipsthumbnail wtc.tif -o x.tif -s 7000
real    0m1.677s
user    0m5.996s
sys     0m0.300s
lovell commented 8 years ago

@jcupitt Great, thanks for the update. I see vips_resize uses this new method so sharp will pick up the benefits when we make the switch to using it.

jcupitt commented 8 years ago

There's a new version of vips_resize() in this branch, it now does a pretty straight lanczos3:

https://github.com/jcupitt/libvips/tree/add-lanczos3

I'm hoping to move the next vips to this new code. It's faster, uses less memory, and quality should be up too.

I've uploaded some samples with the new resampler here:

http://www.rollthepotato.net/~john/skin2 http://www.rollthepotato.net/~john/ghibli

Previous best is here:

http://www.rollthepotato.net/~john/skin

Any feedback would be great.

puzrin commented 8 years ago

@jcupitt do you have own preferences on "how image should be downscaled"? We've rewritten pica to support tiles and multi-cores (in javascript, LOL), and i'd like to understand direction, where to move futher :) .

jcupitt commented 8 years ago

Hi @puzrin, I guess you've seen Nicolas Robidoux's page on resampling?

http://www.imagemagick.org/Usage/filter/nicolas/

It's the best source I know of on this.

puzrin commented 8 years ago

Sure! I'd like to hear your personal opinion.

pre-blur + lanczos3?

jcupitt commented 8 years ago

Yes, that's the current one. This branch is doing:

  1. vertical 1D box shrink
  2. horizontal 1D box shrink
  3. anti-alias, sigma = ((1.0 / residual) - 0.5) / 2.5, only blur if residual < 0.9 && sigma > 0.5
  4. vertical 1D lanczos3
  5. horizontal 1D lanczos3

1 and 3 vectorise well, 2, 4, and 5 do not :( But performance is nicely up from the previous best thanks to no longer needing a line cache.

vips 8.2 with a 10k x 10k RGB jpg:

$ time vipsthumbnail wtc.jpg -s 2000
peak rss: 52m
real    0m0.698s
user    0m1.536s
sys    0m0.044s

And with this branch:

$ time vipsthumbnail wtc.jpg -s 2000
peak rss: 40m
real    0m0.450s
user    0m0.792s
sys    0m0.020s
puzrin commented 8 years ago

@jcupitt thanks for clear equations, that will save me some time. Could you clarify some terms?

jcupitt commented 8 years ago

By box shrink, I just mean a simple integer block average. The residual is the left-over fractional part.

For example, to shrink a 1000 x 1000 pixel image down to 128 x 128, you would first do a 7 x 7 box shrink to get a 142 x 142 image, then lanczos3 with a residual of (128 / 142), about 0.9, to get exactly to 128 x 128.

If the residual is small, you will get nasty aliasing due to undersampling, so you need to have an anti-alias filter between the block shrink and the lanczos3.

puzrin commented 8 years ago

Hm... are there any references available about quality of such 2-stage algorythms (~mipmaping + presize downscale). Nicolas did not mentioned anything similar in his articles on imagemagick site. I've seen such technics somewhere, usually with 2^n preliminary steps, but don't know anything about quality measurements.

jcupitt commented 8 years ago

It's just an optimisation. I think Nicolas takes it as a given.

The "correct" way to do a 8:1 three-lobe Lanczos downsample would be to compute the full 2D kernel for each output pixel. This would have a 7 x 7 centre section which was roughly gaussian, then maybe three pixels of negative each side, another three positive and a final three negative. You'd be computing about a 25 x 25 point mask for each output pixel, so that's ~700 multiply-add for each one, with very poor locality and no opportunities for vectorisation. And that's not including mask generation costs.

The box filter is separable and can be vectorised, and box + blur will approximate the centre part of the large mask. The third pass, which does two simple fixed 6-point lanczos, adds the other lobes reusing the results of the nearby averaging operations. Total cost ~61 multiply-add, at least 10 times faster for almost the same result.

puzrin commented 8 years ago

You'd be computing about a 25 x 25 point mask for each output pixel, so that's ~700 multiply-add for each one, with very poor locality and no opportunities for vectorisation. And that's not including mask generation costs.

??? lanczos3 is separable. It's vectorizable too (see skia src). Difference with Box is ~ 6 times (3x due window size + 2x due memory access for precomputed filters). Filters setup make sense, but not much.

Probably, not vectorizable are Nicolas's modifications like LanczosEWA and others.

It's just an optimisation. I think Nicolas takes it as a given.

IMHO, that's not the same from math point of view (i don't say it's bad or will not work at all). That's why i asked if someone else experimented with such things and did quality compare.

jcupitt commented 8 years ago

Vectorizable with gcc, I should say. I should maybe have a go with orc.

Lanczos is separable in that you can do two 1D lanczos, but you won't get the same result as one 2D lanczos. The two-pass version will be equal to a one-pass version with a rather "squarish" mask. You can see the effect for example in stepping artefacts:

http://www.imagemagick.org/Usage/resize/#distort_resize

It's close, but not exact. Sadly only a few kernels are truly separable. Moving more of the calculations into a block filter reduces the differences.

(I'm not a mathematician at all, so corrections please!)

puzrin commented 8 years ago

Lanczos is separable in that you can do two 1D lanczos, but you won't get the same result as one 2D lanczos.

Let's be precise, kernel can be separable or not separable. No intermediate states :) .

The two-pass version will be equal to a one-pass version with a rather "squarish" mask. You can see the effect for example in stepping artefacts:

That example is for upscale, while we speak about downscale. Lanczos is not ideal for upscale at all.

http://www.imagemagick.org/Usage/distorts/#resize, see note, not sure it's a "classic" lanczos

The real difference in the above two images is that the Distort Operator uses a two dimensional Elliptical Area Resampling filter method (also known as cylindrical filtering or resampling) for its image processing.


Moving more of the calculations into a block filter reduces the differences.

We can do any math we wish, but question is - how to qualify result and will it continue to be predictable? If we implement known methods, then we can refer to theory + past measurements, done by other people.

puzrin commented 8 years ago

Last days we polished image uploading logic in our project & experimented with sharp. My opinion is that results of sharp 0.13.1 are far from perfect should be improved significantly :)

Resize by sharp:

sharp-0-13-1

Resize by pica (lanczos3, without unsharp mask):

pica

Sizes are a bit different, because i grabbed result from the screen. By i think difference if obvious.

jcupitt commented 8 years ago

https://github.com/jcupitt/libvips/issues/404 has some samples of vips with a new lanczos3 downsizer, it should be in 8.3 and should look better.

lovell commented 8 years ago

The switch to use the new vips_reducev and vips_reduceh operations (defaulting to a Lanczos 3 kernel) is now available on the outfit branch and will be in v0.15.0.

npm install lovell/sharp#outfit
lovell commented 8 years ago

This is now in master awaiting release as part of v0.15.0. Feedback welcome, as always.

Next step: run benchmarks and update results.

lovell commented 8 years ago

It looks like resize performance in sharp v0.15.0 (libvips v8.3.1) will be between 10% and 20% faster than previous versions! CPU cache size seems to have the greatest impact on any improvements.

lovell commented 8 years ago

v0.15.0 now available via npm; enjoy.

puzrin commented 8 years ago

5740c36d7becc062289db8e9_sm

Hm... don't see quality improvment after upgrade to 0.15. Should i change any resize options?

puzrin commented 8 years ago

Should i create a new ticket about resize quality? May be i do something wrong - details after resize to small size are auful. I've tried to increase jpeg quality 75% -> 95% - nothing changed.

lovell commented 8 years ago

@puzrin Yes, please create a new issue with sample images that we can use to fine-tune. Please can you also try with gamma() as this disables the use of JPEG shrink-on-load.

puzrin commented 8 years ago

It seems, separate ticket not needed. The main problem was jpeg quality param in 0.14.x, that disappeared in 0.15. Other things can be subjective and not enougth to ask for your attention.

https://github.com/nodeca/sharp-resize-test - that's a test repo to quickly check images from different resizers.

lovell commented 8 years ago

"jpeg quality param in 0.14.x, that disappeared in 0.15"

The quality() option is still there.

https://github.com/lovell/sharp/blob/master/test/unit/io.js#L286-L299

Your resize-test repo looks rather handy, thank you for sharing.

puzrin commented 8 years ago

The quality() option is still there.

I mean in v0.14 passing quality 95% instead of 75% did not improved things in our application. I don't know why. In 0.15 quality difference is visible. Since 0.15 works as expected, we did nod investigated what's up with 0.14 - not actual anymore.

jcupitt commented 8 years ago

Since 7.42, vips has disabled chroma subsampling on jpeg save if Q >= 90, could that be part of it?

puzrin commented 8 years ago

I'd say with Q = 75 quality in 0.15 is better too. At least don't see those big artifacts as before.

To summarize things - previews in our app (nodeca) looked really bad before, but now those are ok. If i notice something strange with ordinary use - i will let you know.