Open jramapuram opened 6 years ago
Please include a requirements.txt
file for use with pip/virtualenv or similar to facilitate reproduction.
Are you sure you are loading the release version (cargo build --release
) of the library? Because if there has been a debug
build I am pretty sure that https://github.com/jramapuram/parallel_image_crop/blob/ba0aeca9e0c68fdd99aed7ff8e2fc2747ed66898/benchmarks/test.py#L173 will first encounter the path
../target/debug/libparallel_image_crop.so
before visiting
../target/release/libparallel_image_crop.so
I see the following numbers, using the release build:
512x512
python crop average over 10 trials : 0.16067423820495605 +/- 0.3671143569581757 sec
rust crop average over 10 trials : 0.04678645133972168 +/- 0.0012110288525016243 sec
4000x4000
python crop average over 10 trials : 0.7032721757888794 +/- 0.3654377582986589 sec
rust crop average over 10 trials : 0.7183722257614136 +/- 0.0021783159746312864 sec
@HeroicKatora : thanks for compiling it; let me know if you still need a requirements.txt
. Was going to add it after I got things working well.
I did compile a release build (and have hardcoded this in benchmarks.py for commit https://github.com/jramapuram/parallel_image_crop/commit/2467a0cfc0be875f84ac8bf0b54f6ed9c95dc9ac ). As mentioned it does work great for small images! (the image present in assets
is only 512x512).
You will have to resize to see the difference:
convert assets/lena.png -resize 4000x4000 assets/lena.png
Here is the workflow:
# let's run crops over a batch of 32 images w/100 trials for 512x512
(base) ➜ parallel_image_crop git:(master) python benchmarks/test.py --batch-size=32 --num-trials=100
python crop average over 100 trials : 0.116404447555542 +/- 0.1595486246983642 sec
rust crop average over 100 trials : 0.09679011821746826 +/- 0.0062470562722498025 sec
# Now we convert it 4000 x 4000 and try the same
(base) ➜ parallel_image_crop git:(master) convert assets/lena.png -resize 4000x4000 assets/lena.png
(base) ➜ parallel_image_crop git:(master) ✗ python benchmarks/test.py --batch-size=32 --num-trials=100
python crop average over 100 trials : 3.173974087238312 +/- 0.2067425334971192 sec
rust crop average over 100 trials : 4.159008667469025 +/- 0.20634427332646715 sec
See full GIST here: https://gist.github.com/jramapuram/f3a69e5810f56347c470d10f54414c5e
I missed that there is no 4000x4000 image in the repository but figured that all results were in release
build. The second set of results that I posted is also with an upscaled image of size 4000x4000. There are several small inefficiencies in the Rust source but none of them dramatic:
Array
, you can convert all variants to DynamicImage
to a Vec<u8>
or even extract a &[u8]
from the DynamicImage
. This avoid some unsafe
and maybe unsoundness.'static
lifetime bound on imageops::resize
) prevents that.Still, this gives only ~4% speedup on my machine.
Take this with a grain of salt healthy for simple benchmarks but measuring load time vs. processing time:
Loading: 399.338101ms, Processing: 3.197122ms
where the first is measure from function starts until after image::open
and the second is from that point until the end of crop_and_resize
. The rest of the execution time should therefore be copying back the result?
Fair points, I will look into removing array: you are right, I initially had it as a return value. In the current state it is not necessary.
Regarding load time: do you think lazy loading will solve this problem? Eg: something like pyramid tiled tif
? All I'm doing are crop()
's followed by a resize()
Afaik, most decoder do not yet implement an API that would only load a partial image. I can link you more precise issues if you want to track the progress of this but the interface was not yet clear. This means that open
, crop
followed by resize
is likely about as fast as load_rect
and resize
.
Edit: This has been request more intensly in the recent past, might focus on getting something available soon.
I think one of them might be mine :P (i.e. load_rect
not working https://github.com/PistonDevelopers/image/issues/802)
Thanks for your help! Any idea why image
is so much faster for smaller images though? I think parity with PIL-SIMD
is great, but it seems to do much better with smaller images --> wonder if that can be translated to the larger ones. But I guess that will have to do with load_rect
and other such impls. Note though: PIL-SIMD
does also load the entire image for crops now as opposed to lazy loading.
@HeroicKatora : I removed the Array
implementation and simply copy the vec over:
let mut resultant_vec = vec![];
image_paths_vec.into_par_iter().zip(scale_values)
.zip(x_values.par_iter()).zip(y_values)
.map(|(((path, scale), x), y)| {
crop_and_resize(path,
*scale, *x, *y,
max_img_percent,
window_size,
window_size).raw_pixels()
}).collect_into_vec(&mut resultant_vec);
// copy the buffer into the return array
let win_size = (window_size * window_size * chans) as usize;
for (begin, rvec) in izip!((0..length*win_size).step_by(win_size), resultant_vec)
{
assert!(rvec.len() == win_size, "rvec [{:?}] != window_size [{:?}]",
rvec.len(), win_size);
unsafe { ptr::copy(rvec.as_ptr() as *const u8, return_ptr.offset(begin as isize),
win_size) };
}
Not sure if the above was what you had in mind? I don't believe I have any control of the static
lifetime bound issue you mentioned in point 2 though.
Results:
Listed below is a comparison against vips
[Note: vips
allows for sequential reading (similar to read_scanline
, which is why cropping should be faster on average --> I think this is a good baseline target for image
). vips
is currently quite a bit faster :
# test over 4000 x 4000 b/w image
(base) ➜ parallel_image_crop git:(master) ✗ python benchmarks/test.py --batch-size=32 --num-trials=100 --use-grayscale --use-vips
python crop average over 100 trials : 0.7368819999694824 +/- 0.20126972433300294 sec
rust crop average over 100 trials : 1.3322323775291443 +/- 0.06476191433811372 sec
Listed below is a comparison against PIL-SIMD
( this is much closer ):
# test over 4000 x 4000 b/w image
(base) ➜ parallel_image_crop git:(master) ✗ python benchmarks/test.py --batch-size=32 --num-trials=100 --use-grayscale
python crop average over 100 trials : 1.0117538738250733 +/- 0.17556666522802858 sec
rust crop average over 100 trials : 1.2921310329437257 +/- 0.044052204986870375 sec
Yes, issue 2 needs to be resolved in this crate and should be published in the next version (might even get a minor version in, it simply relaxes a constraint so no breaking change). I can't promise anything for the speed issue but I will take a look and maybe we can get it to at least the PIL-SIMD baseline.
It would be really nice if image
could match or beat pillow-simd
in benchmarks. I'm almost exclusively having to use Python for image processing because pillow-simd
beats other libraries by a large margin.
I'm interested here too -- we are evaluating rust image to interface with kornia to load directly into tensors. My idea is to wrap image-rs
to https://github.com/dmlc/dlpack to adopt out new kornia.Image
API /cc @carlosb1 @strasdat
So been trying to use
image
as an alternative to speedup image cropping in python. I tested this against python bindings provided byPIL-SIMD
andvips
.image
+rayon
provides great results for mini-batches of small images (eg: 512 x 512 ), however it looks like this does not scale to large images (eg: 4000 x 4000). Here are some detailed results (full repo here):Gray Scale 4000 x 4000 JPEG Image:
Best:
PIL-SIMD
orvips
Gray Scale 4000 x 4000 PNG Image:
Best:
PIL-SIMD
orvips
Gray Scale 4000 x 4000 BMP Image:
Best: all almost equal
Color 512 x 512 PNG Image:
Best:
parallel_image_crop
Rust LibraryColor 512 x 512 JPEG Image:
Best:
parallel_image_crop
Rust Library