johannesvollmer / exrs

100% Safe Rust OpenEXR file library
Other
149 stars 22 forks source link

use rayon_core::ThreadPool + threads fallback on WASM #203

Closed johannesvollmer closed 1 year ago

johannesvollmer commented 1 year ago

previously:

Running benches\read.rs (target\release\deps\read-346953cee12d2063.exe)
test read_single_image_rle_all_channels               ... bench:  23,077,480 ns/iter (+/- 5,920,839)
test read_single_image_rle_non_parallel_all_channels  ... bench:  37,191,290 ns/iter (+/- 13,142,675)
test read_single_image_rle_non_parallel_rgba          ... bench:  39,958,790 ns/iter (+/- 11,766,276)
test read_single_image_rle_rgba                       ... bench:  27,534,700 ns/iter (+/- 15,942,424)
test read_single_image_uncompressed_non_parallel_rgba ... bench:  20,412,270 ns/iter (+/- 3,284,810)
test read_single_image_uncompressed_rgba              ... bench:  20,208,720 ns/iter (+/- 2,113,310)
test read_single_image_zips_non_parallel_rgba         ... bench: 102,007,620 ns/iter (+/- 19,710,591)
test read_single_image_zips_rgba                      ... bench:  34,430,980 ns/iter (+/- 6,738,088)

Running benches\write.rs (target\release\deps\write-ee295502a3d29255.exe)
test write_nonparallel_zip1_to_buffered      ... bench: 478,018,990 ns/iter (+/- 60,906,865)
test write_parallel_any_channels_to_buffered ... bench:  39,375,920 ns/iter (+/- 11,970,844)
test write_parallel_zip16_to_buffered        ... bench: 125,661,040 ns/iter (+/- 19,316,207)
test write_parallel_zip1_to_buffered         ... bench: 107,721,620 ns/iter (+/- 17,027,000)
test write_uncompressed_to_buffered          ... bench:  36,058,700 ns/iter (+/- 15,324,005)

parallel speed: 161%, 145%, 443%, ...

now with rayon:

Running benches\read.rs (target\release\deps\read-0137d5553932e671.exe)
test read_single_image_rle_all_channels               ... bench:  28,930,690 ns/iter (+/- 5,038,888)
test read_single_image_rle_non_parallel_all_channels  ... bench:  37,150,920 ns/iter (+/- 11,987,478)
test read_single_image_rle_non_parallel_rgba          ... bench:  39,809,670 ns/iter (+/- 9,344,255)
test read_single_image_rle_rgba                       ... bench:  32,667,300 ns/iter (+/- 2,025,493)
test read_single_image_uncompressed_non_parallel_rgba ... bench:  19,300,970 ns/iter (+/- 933,911)
test read_single_image_uncompressed_rgba              ... bench:  19,479,210 ns/iter (+/- 6,858,961)
test read_single_image_zips_non_parallel_rgba         ... bench: 102,712,370 ns/iter (+/- 34,694,717)
test read_single_image_zips_rgba                      ... bench:  33,905,040 ns/iter (+/- 4,875,561)

Running benches\write.rs (target\release\deps\write-c4636ec9e59cae60.exe)
test write_nonparallel_zip1_to_buffered      ... bench: 522,813,110 ns/iter (+/- 144,004,227)
test write_parallel_any_channels_to_buffered ... bench:  55,153,260 ns/iter (+/- 22,642,415)
test write_parallel_zip16_to_buffered        ... bench: 152,128,800 ns/iter (+/- 21,509,143)
test write_parallel_zip1_to_buffered         ... bench: 118,787,040 ns/iter (+/- 11,839,229)
test write_uncompressed_to_buffered          ... bench:  34,574,950 ns/iter (+/- 25,650,534)

parallel speed: 128%, 122%, 440%, ...

we might want to try the non-fifo spawn calls in the threadpool, they might slow things down

johannesvollmer commented 1 year ago

master

Running benches\read.rs (target\release\deps\read-e1f4d1352c653dce.exe)
test read_single_image_rle_all_channels               ... bench:  18,840,020 ns/iter (+/- 2,707,747)
test read_single_image_rle_non_parallel_all_channels  ... bench:  29,031,200 ns/iter (+/- 2,341,532)
test read_single_image_rle_non_parallel_rgba          ... bench:  31,188,730 ns/iter (+/- 2,227,412)
test read_single_image_rle_rgba                       ... bench:  21,799,280 ns/iter (+/- 2,330,220)
test read_single_image_uncompressed_non_parallel_rgba ... bench:  16,437,840 ns/iter (+/- 2,210,454)
test read_single_image_uncompressed_rgba              ... bench:  17,363,250 ns/iter (+/- 2,278,051)
test read_single_image_zips_non_parallel_rgba         ... bench:  76,905,370 ns/iter (+/- 3,415,472)
test read_single_image_zips_rgba                      ... bench:  20,967,470 ns/iter (+/- 2,909,308)

Running benches\write.rs (target\release\deps\write-9d3ba8636b780ab2.exe)
test write_nonparallel_zip1_to_buffered      ... bench: 317,108,770 ns/iter (+/- 12,161,251)
test write_parallel_any_channels_to_buffered ... bench:  30,446,700 ns/iter (+/- 4,009,221)
test write_parallel_zip16_to_buffered        ... bench:  61,044,410 ns/iter (+/- 4,958,282)
test write_parallel_zip1_to_buffered         ... bench:  52,370,310 ns/iter (+/- 4,008,664)
test write_uncompressed_to_buffered          ... bench:  26,831,360 ns/iter (+/- 4,751,253)

which means a parallel gain of 154%, 143%, 606% ...

without fifo

Running benches\read.rs (target\release\deps\read-ea6aead5236060bf.exe)
test read_single_image_rle_all_channels               ... bench:  20,310,830 ns/iter (+/- 7,008,828)
test read_single_image_rle_non_parallel_all_channels  ... bench:  30,482,600 ns/iter (+/- 7,425,710)
test read_single_image_rle_non_parallel_rgba          ... bench:  32,887,670 ns/iter (+/- 11,718,651)
test read_single_image_rle_rgba                       ... bench:  22,851,780 ns/iter (+/- 6,029,902)
test read_single_image_uncompressed_non_parallel_rgba ... bench:  17,642,540 ns/iter (+/- 6,100,866)
test read_single_image_uncompressed_rgba              ... bench:  18,237,210 ns/iter (+/- 5,941,470)
test read_single_image_zips_non_parallel_rgba         ... bench:  81,840,630 ns/iter (+/- 6,128,765)
test read_single_image_zips_rgba                      ... bench:  22,718,730 ns/iter (+/- 4,300,432)

Running benches\write.rs (target\release\deps\write-b56af3ef55d097f6.exe)
test write_nonparallel_zip1_to_buffered      ... bench: 349,552,540 ns/iter (+/- 63,847,946)
test write_parallel_any_channels_to_buffered ... bench:  34,021,550 ns/iter (+/- 8,007,533)
test write_parallel_zip16_to_buffered        ... bench:  61,634,630 ns/iter (+/- 6,244,950)
test write_parallel_zip1_to_buffered         ... bench:  53,021,430 ns/iter (+/- 4,410,378)
test write_uncompressed_to_buffered          ... bench:  28,873,260 ns/iter (+/- 9,948,521)

which means parallel gain of 150%, 144%, 659%, ...

notgull commented 1 year ago

Thanks for doing this! Is this a breaking change?

johannesvollmer commented 1 year ago

absolutely! it's fine though, don't worry. this part of the API is pretty deep in the guts, so I'm sure not too many projects use it. considering the wins of WASM, this is absolutely worth the trouble