Open homm opened 7 years ago
Hello @homm! Nice! I'll try and run these tests on my laptop.
I noticed one small thing on the results browser: the hover box is clipped. I see:
so I can only see the first four items. I tried in Firefox and Chrome. All the charts seem to have this issue.
I'm not sure why the repo should be faster, perhaps different compiler settings? I'll investigate.
I noticed one small thing on the results browser
Yes, this happens not in result browser, but in inline graphs in following sections. I'll try to fix.
I've had a look though the vips code, it's mostly fine, nice job.
I noticed of a couple of things:
vips_load.py
there are the lines:im = Image.new_from_file(root('resources', self.filename))
im.write_to_memory()
This will do an extra copy operation. The steps internally are:
a. new_from_file
will set up a delayed random-access load operation
b. write_to_memory
will ask for pixels, and this will trigger the JPEG load
c. since this is random access to a small (relatively) image, the file will be loaded to a memory buffer ready for pixels to be served
d. write_to_memory
will now allocate a second memory buffer and write to that, copying all the pixels over
Instead, how about having:
im = Image.new_from_file(root('resources', self.filename))
im.getpoint(0, 0)
The getpoint
will read pixel (0, 0). This will be enough to trigger the delayed load and pull the whole image into memory.
SaveCase
, though it won't make any difference to runtime, of course.class SaveCase(BaseSaveCase):
def create_test_data(self):
im = Image.new_from_file(root('resources', self.filename))
im.getpoint(0, 0)
return [im]
vips_full_cycle
, I would use kernel
, not interpolate
for resize
. The interpolate
option is a left-over from the previous resize system. im = im.resize(0.4, kernel="cubic")
http://jcupitt.github.io/libvips/API/current/libvips-resample.html#vips-resize
Without a kernel
option, it'll do the default lanczos3.
min_ampl
for gaussblur
is very small. 0.01 means it'll generate the mask all the way out to the final 1%, which is just tiny. You can see the mask size like this:$ export VIPS_INFO=1
$ vips gaussblur k2.jpg x.jpg 4 --min-ampl 0.01
VIPS-INFO: gaussblur mask width 25
VIPS-INFO: intize_to_fixed_point: too many underflows
VIPS-INFO: convi: using C path
VIPS-INFO: intize_to_fixed_point: too many underflows
VIPS-INFO: convi: using C path
So it's making a 25-point mask, then seeing so many underflows it's falling back to the C path. If you pick a larger number you get something closer to pillow, plus vips can use its vector mode.
I tried blurring a single 255 pixel to see the shape of the mask with min-ampl
0.3:
And it looks a bit similar. It looks like pillow is a little asymmetric. Perhaps 0.3 is too high?
If I make all these changes then benchmark, I see:
8.4.5 from repo
Vips full cycle 2560×1600 RGB image
Load+save 0.06268 s 65.35 Mpx/s
+transpose 0.06520 s 62.82 Mpx/s
+resize 0.12671 s 32.32 Mpx/s
+blur 0.13739 s 29.81 Mpx/s
8.5 from source
Vips full cycle 2560×1600 RGB image
Load+save 0.06329 s 64.71 Mpx/s
+transpose 0.06449 s 63.52 Mpx/s
+resize 0.13087 s 31.30 Mpx/s
+blur 0.14103 s 29.04 Mpx/s
8.6 from source
Vips full cycle 2560×1600 RGB image
Load+save 0.06619 s 61.88 Mpx/s
+transpose 0.06721 s 60.94 Mpx/s
+resize 0.06799 s 60.25 Mpx/s
+blur 0.07813 s 52.43 Mpx/s
8.6 has some caching improvements which help resize in this case
current pillow-simd on this laptop
Full cycle 2560×1600 RGB image
Load+save 0.06175 s 66.33 Mpx/s
+transpose 0.07541 s 54.31 Mpx/s
+resize 0.06005 s 68.21 Mpx/s
+blur 0.09469 s 43.26 Mpx/s
0.01 means it'll generate the mask all the way out to the final 1%, which is just tiny.
Yes, and this is the desired behavior: it should be as close as possible to ideal gaussian blur. I can see significant differences between the ideal and 0.1, so 0.3 is unacceptable. The maximum min_ampl
value when I don't see any differences is '0.07'. From left to right: 0.001, 0.07, 0.1:
To be honest, I believe there is an error somewhere because the difference between '0.07' and '0.08' is too sharp.
I made corrections in this branch: https://github.com/python-pillow/pillow-perf/pull/66
Sorry, reverted resize
to interpolate
because there is KeyError: 'kernel'
on VIPS 8.2. Does VIPS 8.5 has backward compatibility with old interpolate
argument?
It's backward-compatible, but not forwards, so old code works on new libvips, but new code can fail on old libvips (if that makes sense).
You'd need to add a version test to use a new feature, for example:
from .vips import Image, Interpolate, at_least_libvips
...
if at_least_libvips(8, 3):
# resize with kernel added in 8.3
im = im.resize(0.4, kernel='cubic')
else:
im = im.resize(0.4, interpolate=Interpolate.new('bicubic'))
OK, I'm happy with 0.07.
It's a shame the rot90
is in there: if you remove that, you can use libvips in streaming mode. If I comment out the rot and change the new_from_file
to:
im = Image.new_from_file(root('resources', self.filename), access='sequential')
I see:
Vips full cycle 2560×1600 RGB image
Load+save 0.04064 s 100.78 Mpx/s
+transpose 0.03657 s 112.01 Mpx/s
+resize 0.05128 s 79.87 Mpx/s
+blur 0.06578 s 62.27 Mpx/s
It's a shame the rot90 is in there
This is one of the common cases. Many JPEG files have an orientation tag in EXIF and this test is designed to emulate this situation. A real shame is there is still no autorotate in Pillow, by the way (
I see: Vips full cycle 2560×1600 RGB image Load+save 0.04064 s 100.78 Mpx/s
I believe this is a bug, VIPS with access='sequential'
uses more than one core even if VIPS_CONCURRENCY=1
is set:
$ time ./run.py vips_full_cycle
Vips full cycle 2560×1600 RGB image
Load+save 0.03131 s 130.80 Mpx/s
real 0m0.524s
user 0m0.740s
sys 0m0.016s
$ time taskset 0x1 ./run.py vips_full_cycle
Vips full cycle 2560×1600 RGB image
Load+save 0.04763 s 86.00 Mpx/s
real 0m0.760s
user 0m0.696s
sys 0m0.032s
It's backward-compatible
But looks like interpolate=Interpolate.new('bicubic')
doesn't affect filter in VIPS 8.5. It turns out this argument isn't backward-compatible.
You'd need to add a version test to use a new feature, for example
Fixed this and updated results: https://github.com/python-pillow/pillow-perf/pull/67
This is one of the common cases.
You'd normally have the rotate after the shrink though, wouldn't you? Otherwise you're rotating more data than you need to.
vipsthumbnail
shrinks to a memory buffer using vips streams, then rotates to the output in a second pipeline.
I believe this is a bug, VIPS with access='sequential' uses more than one core even if VIPS_CONCURRENCY=1 is set:
libvips has double-buffered output plus a write-behind thread, so in streaming mode it's able to overlap JPEG decode and JPEG encode. Perhaps it is cheating.
More info about slow source builds.
This one from source:
$ vips --version
vips-8.5.7-Tue Oct 10 23:00:33 MSK 2017
$ time perf record ./run.py vips_full_cycle -n 21
Vips full cycle 2560×1600 RGB image
Load+save 0.06468 s 63.33 Mpx/s
+transpose 0.07087 s 57.80 Mpx/s
+resize 0.22533 s 18.18 Mpx/s
+blur 0.31908 s 12.84 Mpx/s
[ perf record: Woken up 10 times to write data ]
[ perf record: Captured and wrote 2.347 MB perf.data (59275 samples) ]
real 0m14.656s
user 0m14.168s
sys 0m0.808s
This one from very similar computer but installed from the package:
$ vips --version
vips-8.2.2-Sat Jan 30 17:12:08 UTC 2016
$ time perf record ./run.py vips_full_cycle -n 21
Vips full cycle 2560×1600 RGB image
Load+save 0.04792 s 85.47 Mpx/s
+transpose 0.04944 s 82.85 Mpx/s
+resize 0.12102 s 33.85 Mpx/s
+blur 0.21658 s 18.91 Mpx/s
[ perf record: Woken up 6 times to write data ]
[ perf record: Captured and wrote 1.818 MB perf.data (44469 samples) ]
real 0m9.532s
user 0m9.796s
sys 0m0.252s
Sorry, there is no debug symbols in ubuntu packages, but it is clear, that the layout is completely different. As I understand, functions with _gen
postfix are not SIMD-accelerated. How can I see a reason why SIMD-acceleration is failed?
I've tried to run with ORC_DEBUG=99
and log for the package versions is much much longer. But I haven't found anything useful in the log.
Huh that's interesting.
You can try VIPS_INFO
with recent libvipses:
$ export VIPS_INFO=1
$ vipsthumbnail /data/john/pics/k2.jpg
VIPS-INFO: thumbnailing /data/john/pics/k2.jpg
VIPS-INFO: selected loader is VipsForeignLoadJpegFile
VIPS-INFO: input size is 1450 x 2048
VIPS-INFO: loading jpeg with factor 8 pre-shrink
VIPS-INFO: converting to processing space srgb
VIPS-INFO: residual reducev by 0.5
VIPS-INFO: reducev: 13 point mask
VIPS-INFO: reducev: using vector path
VIPS-INFO: reducev sequential line cache
VIPS-INFO: residual reduceh by 0.5
VIPS-INFO: reduceh: 13 point mask
VIPS-INFO: thumbnailing /data/john/pics/k2.jpg as /data/john/pics/tn_k2.jpg
You'd need to do something to make run.py show stdout.
You'd need to do something to make run.py show stdout.
What exactly? I didn't intercept stdout in the benchmark
You'd normally have the rotate after the shrink though, wouldn't you?
Most libraries and users rotate on image load or immediately after this. This is the most common scenario. To be fairer, I changed explicit rotate operation to autorotate
flag: https://github.com/python-pillow/pillow-perf/pull/68. This helps a lot for the slow, not SIMD-optimized case, but doesn't make any difference for fast. By the way, autorotate
+ access='sequential'
drops performance a lot.
Otherwise you're rotating more data than you need to.
I believe this is the case where VIPS could optimize things more than other libraries. As VIPS doesn't perform operations immediately, it could rearrange operations.
I didn't intercept stdout in the benchmark
Ooop, my fault, they were being blocked at the glib/pyvips boundary. Hang on, I'll fix it up.
It looks like the blur is falling off the vector path too, I'll see if I can fix that as well.
The libvips rotate on load isn't used much: normally you'd shrink first, then use a separate autorot
operator near the end.
libvips is demand-driven, so it scans over the output image pulling pixels through the system. If there's a 90 degree rotate in the pipeline, the nice top-to-bottom scan becomes a horrible vertical scan and image streaming will break.
To fix this, you run the pipeline in two stages: first, you stream from the input through premultiply + shrink + unpremultiply + colour management to make a small memory image, then when that has completed, you run a second pipeline from the memory image through rotate to image output.
This way you can thumbnail huge images in only a little memory. Imagine trying to thumbnail a 300,000 x 300,000 pixel image if you did the rotate at the start.
I understand this. Rotating before resizing leads to extra data manipulations for all libraries in this test suite, not only for VIPS. But I believe current results are more informative for libraries users because this is exactly how people usually work. one, two, three
Also, there is important note:
Method order is important when both rotating and extracting regions, for example rotate(x).extract(y) will produce a different result to extract(y).rotate(x).
So it much more simple to think in transposed picture coordinates, than translate coordinates provided by a user to "storage coordinates".
I've installed pyvips from the master, but this still doesn't work:
$ G_MESSAGES_DEBUG=VIPS VIPS_INFO=1 ./run.py vips_full_cycle
I've been working on too many language bindings this summer :( The Ruby one works with env vars, the Python one redirects glib log output to logger, so you need:
import logging
logging.basicConfig(level=logging.INFO)
in your code. With this program:
import os
import logging
os.environ['VIPS_CONCURRENCY'] = '1'
logging.basicConfig(level=logging.INFO)
from pyvips import Image
im = Image.new_from_file('pineapple.jpeg')
im = im.rot90()
im = im.resize(0.4, kernel='cubic')
im = im.gaussblur(4, min_ampl=0.07)
im.write_to_buffer('.jpg', Q=85)
I see:
john@kiwi:~/try$ python vips.py
INFO:pyvips.vinterpolate:VIPS: residual reducev by 0.4
INFO:pyvips.vinterpolate:VIPS: reducev: 4 point mask
INFO:pyvips.vinterpolate:VIPS: reducev: using vector path
INFO:pyvips.vinterpolate:VIPS: reducev sequential line cache
INFO:pyvips.vinterpolate:VIPS: residual reduceh by 0.4
INFO:pyvips.vinterpolate:VIPS: reduceh: 4 point mask
INFO:pyvips.vinterpolate:VIPS: gaussblur mask width 19
INFO:pyvips.vinterpolate:VIPS: intize_to_fixed_point: too many underflows
INFO:pyvips.vinterpolate:VIPS: convi: using C path
INFO:pyvips.vinterpolate:VIPS: intize_to_fixed_point: too many underflows
INFO:pyvips.vinterpolate:VIPS: convi: using C path
So it's hitting the vector path for vertical reduce, but falling off vector for the gaussblur. I'll try to fix that.
On operation order, sharp does the rotate after the shrink. It has a fixed (horribly complicated!) pipeline and the methods set parameters to that. You can see the pipeline here:
https://github.com/lovell/sharp/blob/master/src/pipeline.cc#L456
I've not been involved with bimg, but I agree it seems to rotate and flip at the start, which is wrong. I'll open an issue, thanks for pointing this out.
imgproxy seems to have copied the logic from bimg, I'll open an issue there too.
This is my output:
INFO:pyvips.vinterpolate:VIPS: shrinkv by 2
INFO:pyvips.vinterpolate:VIPS: shrinkv sequential line cache
INFO:pyvips.vinterpolate:VIPS: shrinkh by 2
INFO:pyvips.vinterpolate:VIPS: residual reducev by 0.8
INFO:pyvips.vinterpolate:VIPS: 4 point mask
INFO:pyvips.vinterpolate:VIPS: using vector path
INFO:pyvips.vinterpolate:VIPS: reducev sequential line cache
INFO:pyvips.vinterpolate:VIPS: residual reduceh by 0.8
INFO:pyvips.vinterpolate:VIPS: 4 point mask
INFO:pyvips.vinterpolate:VIPS: gaussblur mask width 19
INFO:pyvips.vinterpolate:VIPS: intize_to_fixed_point: too many underflows
INFO:pyvips.vinterpolate:VIPS: using C path
INFO:pyvips.vinterpolate:VIPS: intize_to_fixed_point: too many underflows
INFO:pyvips.vinterpolate:VIPS: using C path
OK, I think I see the problem, let me hack on it for a few days.
Thank you for looking into all this, it's good to uncover these horrors.
Do I understand correctly that this is a bug which should be fixed in libvips sources and there is no way to fix it from pyvips code or using other settings? So I can merge last part and this is the final results for current VIPS version?
Yes, I think it can't be fixed with current libvips. Your test has unfortunately hit two weak points: cubic downsize is slower than it should be (as you pointed out, it's not properly adaptive), and the current vector path for convolution can't do large radius gaussblur.
I think it's not much work to resolve this, but it'll be 8.6, so please go ahead with the results you have.
Here are the issues for moving the rotate to the end of the resize pipeline, for reference:
Clear.
I'm still wondering do you have any explanation why the build from Ubuntu packages works in a different way and much faster?
I've subscribed to the rotation issues, will follow.
I have one more question. Is there any way in pyvips to read image properties before load it? For example, when you are calling Image.open
in Pillow, it just returns ImageFile
object with basic properties (like info, mode, width, height) without decoding the whole image.
I think very similar could be achieved with access='sequential'
, but separate method could be significant faster:
In [9]: %timeit pyvips.Image.new_from_file('../imgs/pineapple2k.jpeg', access='sequential')
100 loops, best of 3: 2.89 ms per loop
In [12]: %timeit Image.open('../imgs/pineapple2k.jpeg')
1000 loops, best of 3: 314 µs per loop
Sorry, I don't know why the Ubuntu packages are faster for you. They seem to run the same speed here.
Yes, libvips just loads image properties on new_from_file
, no pixels are decoded until they are needed. I guess the overhead you are seeing is FFI and the operation call. I ought to profile the Python code and see if there are any simple speedups I'm missing --- it's quite a bit slower than the old C-based Python binding.
I tried the same test in C:
/* Compile with:
*
* gcc open.c `pkg-config vips --cflags --libs`
*
*/
#include <vips/vips.h>
int
main( int argc, char **argv )
{
VipsImage *im;
int i;
if( VIPS_INIT( argv[0] ) )
vips_error_exit( NULL );
vips_cache_set_max( 0 );
for( i = 0; i < 10000; i++ ) {
im = vips_image_new_from_file( argv[1], NULL );
g_object_unref( im );
}
return( 0 );
}
And I see:
$ time ./a.out ~/pics/k2.jpg$ time ./a.out ~/pics/k2.jpg
real 0m3.152s
user 0m2.724s
sys 0m0.416s
So also about 300us per open.
@jcupitt Funny thing. I've tried to rearrange operation like you suggest:
def runner(self):
im = Image.new_from_file(root('resources', self.filename))
if self.level > 1:
im = self.resize(im, 0.4, kernel='cubic')
if self.level > 2:
im = im.gaussblur(4, min_ampl=0.07)
if self.level > 0:
im = im.rot90()
im.write_to_buffer('.'+self.filetype, Q=85)
And get significant slowdown on the build from sources:
With autorotate:
Vips full cycle 2560×1600 RGB image
Load+save 0.06201 s 66.06 Mpx/s
+transpose 0.08122 s 50.43 Mpx/s
+resize 0.17138 s 23.90 Mpx/s
+blur 0.26338 s 15.55 Mpx/s
With rotation on the end
Vips full cycle 2560×1600 RGB image
Load+save 0.06253 s 65.50 Mpx/s
+transpose 0.06960 s 58.85 Mpx/s
+resize 0.36972 s 11.08 Mpx/s
+blur 0.52834 s 7.75 Mpx/s
And new perf output:
I made a couple of test programs to show the effect of swapping the pipeline around.
Here's the way it's arranged now, though I switched the kernel to the default (lanczos3), since it actually works properly, and dropped the sigma to 2 to stay on the vector path:
import os
import sys
os.environ['VIPS_CONCURRENCY'] = '1'
#import logging
#logging.basicConfig(level=logging.INFO)
import pyvips
im = pyvips.Image.new_from_file(sys.argv[1])
im = im.rot90()
im = im.resize(0.4)
im = im.gaussblur(2, min_ampl=0.07)
im.write_to_buffer('.jpg', Q=85)
I see:
$ time python vips.py pineapple.jpeg
real 0m0.278s
user 0m0.260s
sys 0m0.024s
If I rearrange it like this (as vips thumbnail and sharp do it):
import os
import sys
os.environ['VIPS_CONCURRENCY'] = '1'
#import logging
#logging.basicConfig(level=logging.INFO)
import pyvips
im = pyvips.Image.new_from_file(sys.argv[1], access='sequential')
im = im.resize(0.4)
im = im.gaussblur(2, min_ampl=0.07)
im = im.copy_memory()
im = im.rot90()
im.write_to_buffer('.jpg', Q=85)
I see:
$ time python vips2.py pineapple.jpeg
real 0m0.257s
user 0m0.220s
sys 0m0.036s
So a bit quicker, though it's not dramatic. Memory use will be lower since it won't need to decode the whole image at the start.
With a 10k x 10k JPEG it's much more impressive:
$ time python vips.py ~/pics/wtc.jpg
real 0m9.063s
user 0m4.452s
sys 0m4.740s
$ time python vips2.py ~/pics/wtc.jpg
real 0m2.048s
user 0m2.056s
sys 0m0.068s
More than four times faster, and memory use is much lower.
Sorry, I've still no insight into why the packages are faster :(
I'll try to investigate later, it would be useful to know.
I tried a Pillow one:
import sys
from PIL import Image, ImageFilter
from io import BytesIO
im = Image.open(sys.argv[1])
im = im.transpose(Image.ROTATE_270)
size = (int(im.size[0] * 0.4 + 0.5),
int(im.size[1] * 0.4 + 0.5))
im = im.resize(size, Image.LANCZOS)
im = im.filter(ImageFilter.GaussianBlur(2))
im.save(BytesIO(), 'JPEG', quality=85)
Is that two-lobe or three-lobe lanczos, by the way?
Anyway, for pineapple, Pillow wins, but with larger images, vips edges ahead:
$ time python pillow.py pineapple.jpeg
real 0m0.168s
user 0m0.144s
sys 0m0.020s
$ time python pillow.py ~/pics/wtc.jpg
real 0m2.275s
user 0m2.044s
sys 0m0.228s
I'm not sure why vips.py
is so slow on large images (4.7s sys time!), I should check that too.
Oh heh, of course if I put the rot90 at the end, Pillow speeds up too:
$ time python pillow.py ~/pics/wtc.jpg
real 0m2.042s
user 0m1.860s
sys 0m0.180s
So pretty much a draw.
Is that two-lobe or three-lobe lanczos, by the way?
In Pillow Lanczos is 3-lobe.
To see how much was libvips and how much was pyvips, I tried in C:
/* Compile with:
*
* gcc vips2.c `pkg-config vips --cflags --libs`
*
*/
#include <vips/vips.h>
int
main( int argc, char **argv )
{
VipsImage *im;
VipsImage *x;
void *buf;
size_t size;
if( VIPS_INIT( argv[0] ) )
vips_error_exit( NULL );
if( !(im = vips_image_new_from_file( argv[1],
"access", VIPS_ACCESS_SEQUENTIAL,
NULL )) )
vips_error_exit( NULL );
if( vips_resize( im, &x, 0.4, NULL ) )
vips_error_exit( NULL );
g_object_unref( im );
im = x;
if( vips_gaussblur( im, &x, 2.0,
"min_ampl", 0.07,
NULL ) )
vips_error_exit( NULL );
g_object_unref( im );
im = x;
im = vips_image_copy_memory( im );
if( vips_rot90( im, &x, NULL ) )
vips_error_exit( NULL );
g_object_unref( im );
im = x;
if( vips_image_write_to_buffer( im, ".jpg", &buf, &size,
"Q", 85,
NULL ) )
vips_error_exit( NULL );
g_object_unref( im );
return( 0 );
}
I see:
$ time ./a.out pineapple.jpeg
real 0m0.118s
user 0m0.100s
sys 0m0.020s
$ time ./a.out ~/pics/wtc.jpg
real 0m1.856s
user 0m1.892s
sys 0m0.040s
So it looks like pyvips slowness is a factor here too.
Hello again, I fixed up git master libvips a bit more. Cubic and linear resize are now properly adaptive, and I switched the convolution vector path from fixed point to half float. If I run git master pillow-perf I see:
$ ./run.py full_cycle
Full cycle 2560×1600 RGB image
Load+save 0.06154 s 66.56 Mpx/s
+transpose 0.07465 s 54.87 Mpx/s
+resize 0.06081 s 67.35 Mpx/s
+blur 0.07642 s 53.60 Mpx/s
$ ./run.py vips_full_cycle
Vips full cycle 2560×1600 RGB image
Load+save 0.06587 s 62.19 Mpx/s
+transpose 0.08125 s 50.41 Mpx/s
+resize 0.08176 s 50.10 Mpx/s
+blur 0.11350 s 36.09 Mpx/s
This should be in 8.6 in a few weeks.
It's not obvious to me why Pillow transpose + resize seems to be faster than just transpose, do you know why that is?
I think the next improvement would be to try to speed up pyvips itself, but it looks like that would involve switching cffi to API mode and generating and building a C extension, which is more work than I have time for right now.
I found out why vips.py
was very slow on huge images -- it switches to going via disk if images are over 100mb uncompressed (of course, duh). If you set VIPS_DISC_THRESHOLD
to 500mb
the speed difference mostly vanishes for the 1-thread case.
I'm still unclear why the official packages are faster
It's not obvious to me why Pillow transpose + resize seems to be faster than just transpose, do you know why that is?
Sure. The last operation in all cases is saving. So the larger image to save, the slower the saving operation. After resizing, image is reduced in 6.25 times. If resizing is faster than saving, the whole case with resizing will be faster.
it switches to going via disk if images are over 100mb uncompressed
It's not clear to me. Does VIPS switch to disk bufer even with access='sequential'
?
Oh, of course, the save is smaller.
No, it will never disk buffer in sequential mode, it'll fail with an error instead.
No, it will never disk buffer in sequential mode
Yeah, right. I see now
I'm getting following compilation error on current master:
composite.cpp: In function ‘void vips_composite_blend3(VipsComposite*, VipsBlendMode, v4f&, T*)’:
composite.cpp:676:23: error: conversion of scalar ‘int’ to vector ‘v4f {aka __vector(4) float}’ involves truncation
f = g > 0 ? g : -1 * g;
^
composite.cpp:814: confused by earlier errors, bailing out
After change -1 * g
to -g
I get following:
composite.cpp: In function ‘void vips_combine_pixels3(VipsComposite*, VipsPel*, VipsPel**) [with T = unsigned char; long int min_T = 0l; long int max_T = 255l; VipsComposite = _VipsComposite; VipsPel = unsigned char]’:
composite.cpp:814:3: internal compiler error: in gimplify_expr, at gimplify.c:8858
B = VIPS_CLIP( low, B, high );
^
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-5/README.Bugs> for instructions.
$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609
Huh, strange, it builds cleanly on 17.04 with gcc 6.3. I'll try 5.4. Thanks for letting me know.
It builds cleanly in gcc-5.4.1 with -Wall
. I'll try clang.
composite.cpp
is using gcc vector mode (the __attribute__(vector_size())
stuff) and it seems to be a bit broken on gcc 5.4.0. I've updated configure to disable vector extensions on 4.x and 5.x.
Thanks!
Yeah, It runs much faster on the laptop:
libvips 8.5.0:
Vips full cycle 2560×1600 RGB image
Load+save 0.06354 s 64.46 Mpx/s
+transpose 0.08367 s 48.96 Mpx/s
+resize 0.17549 s 23.34 Mpx/s
+blur 0.26749 s 15.31 Mpx/s
libvips 8.6.0:
Vips full cycle 2560×1600 RGB image
Load+save 0.06294 s 65.07 Mpx/s
+transpose 0.08452 s 48.46 Mpx/s
+resize 0.09042 s 45.30 Mpx/s
+blur 0.11938 s 34.31 Mpx/s
But there is still some problem on the same OS, the same compiler and very similar CPU (Ubuntu 16.04 64-bit, GCC 5.4, Intel Haswell generation) on the AWS EC2 c4.large server:
libvips 8.2.?:
Vips full cycle 2560×1600 RGB image
Load+save 0.05109 s 80.18 Mpx/s
+transpose 0.06705 s 61.09 Mpx/s
+resize 0.13450 s 30.45 Mpx/s
+blur 0.23270 s 17.60 Mpx/s
libvips 8.6.0:
Vips full cycle 2560×1600 RGB image
Load+save 0.05029 s 81.45 Mpx/s
+transpose 0.06930 s 59.10 Mpx/s
+resize 0.13910 s 29.45 Mpx/s
+blur 0.18010 s 22.74 Mpx/s
Blur becomes faster, resize doesn't.
Perf report on the laptop:
c4.large:
Reports are quite different.
That's very odd. Could you try running with INFO logging enabled?
I'll try on the big i7 at work tomorrow.
It looks the same.
Laptop:
$ ./testsuite/run.py vips_full_cycle -n1
(8, 6, 0, 50)
Vips full cycle 2560×1600 RGB image
Load+save 0.08717 s 46.99 Mpx/s
+transpose 0.09384 s 43.65 Mpx/s
INFO:pyvips.vinterpolate:VIPS: residual reducev by 0.4
INFO:pyvips.vinterpolate:VIPS: reducev: 11 point mask
INFO:pyvips.vinterpolate:VIPS: reducev: using vector path
INFO:pyvips.vinterpolate:VIPS: reducev sequential line cache
INFO:pyvips.vinterpolate:VIPS: residual reduceh by 0.4
INFO:pyvips.vinterpolate:VIPS: reduceh: 11 point mask
+resize 0.11130 s 36.80 Mpx/s
INFO:pyvips.vinterpolate:VIPS: residual reducev by 0.4
INFO:pyvips.vinterpolate:VIPS: reducev: 11 point mask
INFO:pyvips.vinterpolate:VIPS: reducev: using vector path
INFO:pyvips.vinterpolate:VIPS: reducev sequential line cache
INFO:pyvips.vinterpolate:VIPS: residual reduceh by 0.4
INFO:pyvips.vinterpolate:VIPS: reduceh: 11 point mask
INFO:pyvips.vinterpolate:VIPS: gaussblur mask width 19
INFO:pyvips.vinterpolate:VIPS: convi: using vector path
INFO:pyvips.vinterpolate:VIPS: convi: using vector path
+blur 0.12818 s 31.95 Mpx/s
c4.large:
$ ./testsuite/run.py vips_full_cycle -n1
(8, 6, 0, 50)
Vips full cycle 2560×1600 RGB image
Load+save 0.06522 s 62.81 Mpx/s
+transpose 0.06863 s 59.68 Mpx/s
INFO:pyvips.vinterpolate:VIPS: residual reducev by 0.4
INFO:pyvips.vinterpolate:VIPS: reducev: 11 point mask
INFO:pyvips.vinterpolate:VIPS: reducev: using vector path
INFO:pyvips.vinterpolate:VIPS: reducev sequential line cache
INFO:pyvips.vinterpolate:VIPS: residual reduceh by 0.4
INFO:pyvips.vinterpolate:VIPS: reduceh: 11 point mask
+resize 0.14126 s 29.00 Mpx/s
INFO:pyvips.vinterpolate:VIPS: residual reducev by 0.4
INFO:pyvips.vinterpolate:VIPS: reducev: 11 point mask
INFO:pyvips.vinterpolate:VIPS: reducev: using vector path
INFO:pyvips.vinterpolate:VIPS: reducev sequential line cache
INFO:pyvips.vinterpolate:VIPS: residual reduceh by 0.4
INFO:pyvips.vinterpolate:VIPS: reduceh: 11 point mask
INFO:pyvips.vinterpolate:VIPS: gaussblur mask width 19
INFO:pyvips.vinterpolate:VIPS: convi: using vector path
INFO:pyvips.vinterpolate:VIPS: convi: using vector path
+blur 0.18203 s 22.50 Mpx/s
Hi, @jcupitt!
I'd like to ask you to review pyvips results in pillow-perf benchmarks.
Originally in this benchmark, there was one test for one operation. As VIPS works differently, I've added "Full operations cycle" competition with several operations in each test (from two to five). To see index and numbers, hover the graphs. Please note, all tests are run in a single thread, the explanation of this is given at the start of the page.
Results browser
At first, I'd like you to check the benchmark code. Is it correct and is it equivalent to Pillow code?
Benchmarks repository and readme Pyvips code Pillow code
If you uncomment the last line, you could compare results visually. They are almost identical except small shift for VIPS, I already seen the issue about this.
There are results for three systems: Macbook with Core i5, EC2 c3.large and EC2 c4.large. On Macbook I have libvips 8.5.7 from sources with the following configuration:
$ CXXFLAGS=-D_GLIBCXX_USE_CXX11_ABI=0 ./configure --without-python
``` checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... /bin/mkdir -p checking for gawk... no checking for mawk... mawk checking whether make sets $(MAKE)... yes checking whether make supports nested variables... yes checking for pkg-config... /usr/bin/pkg-config checking pkg-config is at least version 0.9.0... yes checking for gobject-introspection... no checking build system type... x86_64-pc-linux-gnu checking host system type... x86_64-pc-linux-gnu checking for needs -lstdc++... yes checking for native Win32... no checking for binary open needed... no checking for Mac OS X... no checking for style of include used by make... GNU checking for gcc... gcc checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed checking whether gcc understands -c and -o together... yes checking dependency style of gcc... gcc3 checking for special C compiler options needed for large files... no checking for _FILE_OFFSET_BITS value needed for large files... no checking for gawk... (cached) mawk checking for gcc... (cached) gcc checking whether we are using the GNU C compiler... (cached) yes checking whether gcc accepts -g... (cached) yes checking for gcc option to accept ISO C89... (cached) none needed checking whether gcc understands -c and -o together... (cached) yes checking dependency style of gcc... (cached) gcc3 checking for gcc option to accept ISO C99... none needed checking for gcc option to accept ISO Standard C... (cached) none needed checking for g++... g++ checking whether we are using the GNU C++ compiler... yes checking whether g++ accepts -g... yes checking dependency style of g++... gcc3 checking for an ANSI C-conforming const... yes checking for C/C++ restrict keyword... __restrict checking for ranlib... ranlib checking whether ln -s works... yes checking if malloc debugging is wanted... no checking how to run the C preprocessor... gcc -E checking for grep that handles long lines and -e... /bin/grep checking for egrep... /bin/grep -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking locale.h usability... yes checking locale.h presence... yes checking for locale.h... yes checking for LC_MESSAGES... yes checking libintl.h usability... yes checking libintl.h presence... yes checking for libintl.h... yes checking for ngettext in libc... yes checking for dgettext in libc... yes checking for bind_textdomain_codeset... yes checking for msgfmt... /usr/bin/msgfmt checking for dcgettext... yes checking if msgfmt accepts -c... yes checking for gmsgfmt... /usr/bin/msgfmt checking for xgettext... /usr/bin/xgettext checking for catalogs to be installed... en_GB de checking for dirent.h that defines DIR... yes checking for library containing opendir... none required checking for ANSI C header files... (cached) yes checking errno.h usability... yes checking errno.h presence... yes checking for errno.h... yes checking math.h usability... yes checking math.h presence... yes checking for math.h... yes checking fcntl.h usability... yes checking fcntl.h presence... yes checking for fcntl.h... yes checking limits.h usability... yes checking limits.h presence... yes checking for limits.h... yes checking for stdlib.h... (cached) yes checking for string.h... (cached) yes checking sys/file.h usability... yes checking sys/file.h presence... yes checking for sys/file.h... yes checking sys/ioctl.h usability... yes checking sys/ioctl.h presence... yes checking for sys/ioctl.h... yes checking sys/param.h usability... yes checking sys/param.h presence... yes checking for sys/param.h... yes checking sys/time.h usability... yes checking sys/time.h presence... yes checking for sys/time.h... yes checking sys/mman.h usability... yes checking sys/mman.h presence... yes checking for sys/mman.h... yes checking for sys/types.h... (cached) yes checking for sys/stat.h... (cached) yes checking for unistd.h... (cached) yes checking io.h usability... no checking io.h presence... no checking for io.h... no checking direct.h usability... no checking direct.h presence... no checking for direct.h... no checking windows.h usability... no checking windows.h presence... no checking for windows.h... no checking for dllwrap... no checking for dlltool... dlltool checking for objdump... objdump checking for ranlib... (cached) ranlib checking for strip... strip checking for ar... ar checking for as... as checking for ld... ld checking how to print strings... printf checking for a sed that does not truncate output... /bin/sed checking for fgrep... /bin/grep -F checking for ld used by gcc... ld checking if the linker (ld) is GNU ld... yes checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B checking the name lister (/usr/bin/nm -B) interface... BSD nm checking the maximum length of command line arguments... 1572864 checking how to convert x86_64-pc-linux-gnu file names to x86_64-pc-linux-gnu format... func_convert_file_noop checking how to convert x86_64-pc-linux-gnu file names to toolchain format... func_convert_file_noop checking for ld option to reload object files... -r checking for objdump... (cached) objdump checking how to recognize dependent libraries... pass_all checking for dlltool... (cached) dlltool checking how to associate runtime and link libraries... printf %s\n checking for archiver @FILE support... @ checking for strip... (cached) strip checking for ranlib... (cached) ranlib checking command to parse /usr/bin/nm -B output from gcc object... ok checking for sysroot... no checking for a working dd... /bin/dd checking how to truncate binary pipes... /bin/dd bs=4096 count=1 checking for mt... mt checking if mt is a manifest tool... no checking for dlfcn.h... yes checking for objdir... .libs checking if gcc supports -fno-rtti -fno-exceptions... no checking for gcc option to produce PIC... -fPIC -DPIC checking if gcc PIC flag -fPIC -DPIC works... yes checking if gcc static flag -static works... yes checking if gcc supports -c -o file.o... yes checking if gcc supports -c -o file.o... (cached) yes checking whether the gcc linker (ld -m elf_x86_64) supports shared libraries... yes checking whether -lc should be explicitly linked in... no checking dynamic linker characteristics... GNU/Linux ld.so checking how to hardcode library paths into programs... immediate checking whether stripping libraries is possible... yes checking if libtool supports shared libraries... yes checking whether to build shared libraries... yes checking whether to build static libraries... yes checking how to run the C++ preprocessor... g++ -E checking for ld used by g++... ld -m elf_x86_64 checking if the linker (ld -m elf_x86_64) is GNU ld... yes checking whether the g++ linker (ld -m elf_x86_64) supports shared libraries... yes checking for g++ option to produce PIC... -fPIC -DPIC checking if g++ PIC flag -fPIC -DPIC works... yes checking if g++ static flag -static works... yes checking if g++ supports -c -o file.o... yes checking if g++ supports -c -o file.o... (cached) yes checking whether the g++ linker (ld -m elf_x86_64) supports shared libraries... yes checking dynamic linker characteristics... (cached) GNU/Linux ld.so checking how to hardcode library paths into programs... immediate checking for an ANSI C-conforming const... (cached) yes checking for mode_t... yes checking for off_t... yes checking for size_t... yes checking for working memcmp... yes checking for stdlib.h... (cached) yes checking for unistd.h... (cached) yes checking for sys/param.h... (cached) yes checking for getpagesize... yes checking for working mmap... yes checking for vprintf... yes checking for _doprnt... no checking for getcwd... yes checking for gettimeofday... yes checking for getwd... yes checking for memset... yes checking for munmap... yes checking for putenv... yes checking for realpath... yes checking for strcasecmp... yes checking for strchr... yes checking for strcspn... yes checking for strdup... yes checking for strerror... yes checking for strrchr... yes checking for strspn... yes checking for vsnprintf... yes checking for realpath... (cached) yes checking for mkstemp... yes checking for mktemp... yes checking for random... yes checking for rand... yes checking for sysconf... yes checking for atexit... yes checking for cbrt in -lm... yes checking for hypot in -lm... yes checking for atan2 in -lm... yes checking for REQUIRED... yes checking for CONTEXT_GET_HELP... yes checking for MONOTONIC... yes checking for THREADS... yes checking for TYPE_INIT... no checking for TYPE_INIT... yes checking for gtk-doc... no configure: WARNING: You will not be able to create source packages with 'make dist' because gtk-doc >= 1.14 is not found. checking for gtkdoc-check... no checking for gtkdoc-check... no checking for gtkdoc-rebase... no checking for gtkdoc-mkpdf... no checking whether to build gtk-doc documentation... no checking for GTKDOC_DEPS... yes checking for XML_ParserCreate in -lexpat... yes checking expat.h usability... yes checking expat.h presence... yes checking for expat.h... yes checking for GSF... yes checking for GSF_ZIP64... yes checking for FFTW... yes checking for MAGICK_WAND... yes checking for MAGICK... no checking for MAGICK... yes checking for SetImageOption... yes checking for MagickCoreGenesis... yes checking for ResetImagePropertyIterator... yes checking for ResetImageAttributeIterator... yes checking for GetVirtualPixels... yes checking for struct _ImageInfo.number_scenes... yes checking for ORC... yes checking for orc_program_get_error... yes checking for LCMS... yes checking for OPENEXR... yes checking for POPPLER... yes checking for RSVG... yes checking for X... libraries , headers checking for gethostbyname... yes checking for connect... yes checking for remove... yes checking for shmat... yes checking for IceConnectionNumber in -lICE... yes checking for ZLIB... yes checking for OPENSLIDE... yes checking for MATIO... yes checking for CFITSIO... yes checking for LIBWEBP... yes checking for LIBWEBPMUX... no configure: WARNING: libwebpmux not found; disabling webp metadata support checking for PANGOFT2... yes configure: WARNING: pygobject-3.0 not found; disabling vips8 python support checking for TIFF... yes checking for giflib... libraries -lgif, headers in default path checking for PNG... yes checking for JPEG... libraries -ljpeg, headers in default path checking for jpeg_c_bool_param_supported... no checking for EXIF... yes checking exif-data.h usability... yes checking exif-data.h presence... yes checking for exif-data.h... yes checking that generated files are newer than configure... done configure: creating ./config.status config.status: creating vips.pc config.status: creating vipsCC.pc config.status: creating vips-cpp.pc config.status: creating Makefile config.status: creating libvips/include/vips/version.h config.status: creating libvips/include/Makefile config.status: creating libvips/include/vips/Makefile config.status: creating libvips/Makefile config.status: creating libvips/arithmetic/Makefile config.status: creating libvips/colour/Makefile config.status: creating libvips/conversion/Makefile config.status: creating libvips/convolution/Makefile config.status: creating libvips/deprecated/Makefile config.status: creating libvips/foreign/Makefile config.status: creating libvips/freqfilt/Makefile config.status: creating libvips/histogram/Makefile config.status: creating libvips/draw/Makefile config.status: creating libvips/iofuncs/Makefile config.status: creating libvips/morphology/Makefile config.status: creating libvips/mosaicing/Makefile config.status: creating libvips/create/Makefile config.status: creating libvips/resample/Makefile config.status: creating libvips/video/Makefile config.status: creating libvipsCC/include/Makefile config.status: creating libvipsCC/include/vips/Makefile config.status: creating libvipsCC/Makefile config.status: creating cplusplus/include/Makefile config.status: creating cplusplus/include/vips/Makefile config.status: creating cplusplus/Makefile config.status: creating tools/Makefile config.status: creating tools/batch_crop config.status: creating tools/batch_image_convert config.status: creating tools/batch_rubber_sheet config.status: creating tools/light_correct config.status: creating tools/shrink_width config.status: creating python/Makefile config.status: creating python/packages/Makefile config.status: creating python/packages/gi/Makefile config.status: creating python/packages/gi/overrides/Makefile config.status: creating test/Makefile config.status: creating test/variables.sh config.status: creating swig/Makefile config.status: creating swig/vipsCC/Makefile config.status: creating man/Makefile config.status: creating doc/Makefile config.status: creating doc/libvips-docs.xml config.status: creating po/Makefile.in config.status: creating config.h config.status: config.h is unchanged config.status: executing depfiles commands config.status: executing default-1 commands config.status: executing libtool commands * build options native win32: no native OS X: no open files in binary mode: no enable debug: no build deprecated components: yes build docs with gtkdoc: no gobject introspection: no build vips7 Python binding: no install vips8 Python overrides: no (requires pygobject-3.13.0 or later) build radiance support: yes build analyze support: yes build PPM support: yes * optional dependencies use fftw3 for FFT: yes Magick package: MagickCore file import with libMagick: yes (magick6) accelerate loops with orc: yes (requires orc-0.4.11 or later) ICC profile support with lcms: yes (lcms2) file import with OpenEXR: yes file import with OpenSlide: yes (requires openslide-3.3.0 or later) file import with matio: yes PDF import with poppler-glib: yes (requires poppler-glib 0.16.0 or later) SVG import with librsvg-2.0: yes (requires librsvg-2.0 2.34.0 or later) zlib: yes file import with cfitsio: yes file import/export with libwebp: yes (requires libwebp-0.1.3 or later) support webp metadata: no (requires libwebpmux-0.5 or later) text rendering with pangoft2: yes file import/export with libpng: yes (pkg-config libpng >= 1.2.9) (requires libpng-1.2.9 or later) file import/export with libtiff: yes (pkg-config libtiff-4) file import/export with giflib: yes (found by search) file import/export with libjpeg: yes image pyramid export: yes (requires libgsf-1 1.14.26 or later) use libexif to load/save JPEG metadata: yes ```And on ec2 instances, I have VIPS 8.2 from Ubuntu package.
The strange thing is VIPS from sources works signifficantly slower than from the repo. This is VIPS 8.5 from sources compared with VIPS 8.2 from a package on the same computer:
I also tried to build VIPS 8.2 from sources, it has the same speed as 8.5 from sources. Do you have explanation for this?