libvips / pyvips

python binding for libvips using cffi
MIT License
634 stars 49 forks source link

Working on multiple images at the same time #254

Open jpadfield opened 3 years ago

jpadfield commented 3 years ago

Hi,

I have a few processes making use of older vips python options ( gi.require_version('Vips', '8.0') ). To maximise the efficiency of processing larger numbers of images this was twined with the python lib "pp" to enable full usage of multiple cores.

Now the pyvips code is described as being "parallel" but I could not see any comment about how well this works and if it would still be beneficial to make use of "pp", or something similar, or if pyvips can cover this as well?

If pyvips should be able to maximise the use of multiple cores - are there any examples of how this should be done?

Just for info the current job is looking at reformatting images and saving them as Pyramidal Tiffs for IIIF applications.

Thanks

Joe

jcupitt commented 3 years ago

Hi Joe,

pyvips will make some use of cores, but how much parallelism it can find depends on the task. For example, for conversion to pyr tiff I see:

$ vipsheader summer8.tif 
summer8.tif: 18008x7588 uchar, 3 bands, srgb, tiffload
$ time vips copy summer8.tif x.tif[pyramid,tile,compression=jpeg] --vips-concurrency=1
real    0m1.405s
user    0m1.440s
sys 0m0.162s
$ time vips copy summer8.tif x.tif[pyramid,tile,compression=jpeg]
real    0m1.308s
user    0m1.433s
sys 0m0.242s

So ... it finds some speedup here, but it's less than 10%. The problem here is that the tiff library is single-threaded and you just can't make that go faster.

I'd test your use case and see how much parallelism you get (time your workload on 1 thread vs all threads), then use pp to run enough of the pipelines in parallel to load your machine. For example, if you see a x2 speedup and you have 8 cores, run 4 pyvips conversions at once.

jpadfield commented 3 years ago

I will give it a go, thanks John :-)