google / wuffs

Wrangling Untrusted File Formats Safely
Other
4.07k stars 129 forks source link

Use iterators for converting RGB to BGRA in the Rust PNG benchmark #45

Closed landaire closed 3 years ago

landaire commented 3 years ago

There was some conversation on the Rust subreddit about the recent PNG decoder blog post and some users had pointed out that the RGB to BGRA conversion is a particularly great example of where Rust fails to elide bounds checks.

The full thread is here: https://www.reddit.com/r/rust/comments/mlfhlo/wuffs_png_decoder_faster_than_rust/gtm8odo/

I've applied a patch to use the iterator-based solution from the user po8 which provides a nice speedup:

❯ benchstat before.txt after.txt
name                                              old time/op   new time/op   delta
rust_png_decode_image_19k_8bpp                      116µs ± 7%    118µs ± 3%     ~     (p=0.421 n=5+5)
rust_png_decode_image_40k_24bpp                     172µs ± 3%    159µs ± 3%   -7.66%  (p=0.008 n=5+5)
rust_png_decode_image_77k_8bpp                      280µs ± 1%    271µs ± 5%     ~     (p=0.222 n=5+5)
rust_png_decode_image_552k_32bpp_verify_checksum   2.07ms ± 2%   1.76ms ± 3%  -15.16%  (p=0.008 n=5+5)
rust_png_decode_image_4002k_24bpp                  17.3ms ± 1%   16.4ms ± 4%   -5.35%  (p=0.008 n=5+5)

name                                              old speed     new speed     delta
rust_png_decode_image_19k_8bpp                    165MB/s ± 6%  163MB/s ± 3%     ~     (p=0.421 n=5+5)
rust_png_decode_image_40k_24bpp                   235MB/s ± 3%  254MB/s ± 3%   +8.30%  (p=0.008 n=5+5)
rust_png_decode_image_77k_8bpp                    275MB/s ± 1%  284MB/s ± 5%     ~     (p=0.222 n=5+5)
rust_png_decode_image_552k_32bpp_verify_checksum  266MB/s ± 2%  314MB/s ± 2%  +17.89%  (p=0.008 n=5+5)
rust_png_decode_image_4002k_24bpp                 231MB/s ± 1%  245MB/s ± 3%   +5.69%  (p=0.008 n=5+5)

I tried to tackle the conversion from RGBA to BGRA but I ended up with a very odd negative result somehow:

❯ benchstat before.txt after.txt rgba_to_bgra.txt
name \ time/op                                    before.txt    after.txt     rgba_to_bgra.txt
rust_png_decode_image_19k_8bpp                      116µs ± 7%    118µs ± 3%        117µs ± 7%
rust_png_decode_image_40k_24bpp                     172µs ± 3%    159µs ± 3%        153µs ±10%
rust_png_decode_image_77k_8bpp                      280µs ± 1%    271µs ± 5%        266µs ±11%
rust_png_decode_image_552k_32bpp_verify_checksum   2.07ms ± 2%   1.76ms ± 3%      16.01ms ± 2%
rust_png_decode_image_4002k_24bpp                  17.3ms ± 1%   16.4ms ± 4%       15.8ms ± 7%

name \ speed                                      before.txt    after.txt     rgba_to_bgra.txt
rust_png_decode_image_19k_8bpp                    165MB/s ± 6%  163MB/s ± 3%      164MB/s ± 7%
rust_png_decode_image_40k_24bpp                   235MB/s ± 3%  254MB/s ± 3%      264MB/s ± 9%
rust_png_decode_image_77k_8bpp                    275MB/s ± 1%  284MB/s ± 5%      289MB/s ±10%
rust_png_decode_image_552k_32bpp_verify_checksum  266MB/s ± 2%  314MB/s ± 2%       34MB/s ± 2%
rust_png_decode_image_4002k_24bpp                 231MB/s ± 1%  245MB/s ± 3%      254MB/s ± 7%

The patch for rgba_to_bgra.txt looks like:

diff --git a/script/bench-rust-png/src/main.rs b/script/bench-rust-png/src/main.rs
index 14e9c07a..674fba53 100644
--- a/script/bench-rust-png/src/main.rs
+++ b/script/bench-rust-png/src/main.rs
@@ -159,11 +159,11 @@ fn decode(dst0: &mut [u8], dst1: &mut [u8], src: &[u8]) -> u64 {
         return new_size as u64;
     } else if info.color_type == png::ColorType::RGBA {
         // Convert RGBA => BGRA.
-        for i in 0..((num_bytes / 4) as usize) {
-            let d = dst0[(4 * i) + 0];
-            dst0[(4 * i) + 0] = dst0[(4 * i) + 2];
-            dst0[(4 * i) + 2] = d;
+        for data in dst0.chunks_exact_mut(4) {
+            let data: &mut [u8; 4] = data.try_into().unwrap();
+            data.swap(0, 2);
         }
+
         return num_bytes;
     }
     // Returning 0 should lead to a panic (when want_num_bytes != 0).

Note: I have not tested po8's patch for functional correctness but it it seems to do the right thing.

google-cla[bot] commented 3 years ago

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

landaire commented 3 years ago

@googlebot I signed it!

landaire commented 3 years ago

I force pushed a new change which correctly does the RGB to BGRA encoding. po8's patch just copied from src to dst without taking into account that R and B are in opposite positions. The benchmark stats were updated accordingly.

BartMassey commented 3 years ago

No big secret that I am /u/po8 on Reddit. Glad to see this change go in! Let me know if I can be of help somehow. Thanks to @landaire for doing the hard part.

nigeltao commented 3 years ago

Thanks!