image-rs / jpeg-decoder

JPEG decoder written in Rust
Apache License 2.0
149 stars 87 forks source link

Speed is half of normal jpeg decoders (on ARM) #202

Closed fzyzcjy closed 2 years ago

fzyzcjy commented 2 years ago

Hi thanks for the lib! However, the speed of this library seems to be half of normal jpeg decoders. When running on ARM, normal decoder uses 600ms to decode a 4000x3000 color JPEG, while this lib uses 1100ms.

HeroicKatora commented 2 years ago

What 'normal decoder' are your referring to?

fzyzcjy commented 2 years ago

opencv's builtin encoder.

HeroicKatora commented 2 years ago

OpenCV does not have a builtin encoder, it uses either libjpeg or libjpeg-turbo. Should this be closed as a duplicate of #155 ?

fzyzcjy commented 2 years ago

@HeroicKatora Sounds like it is similar. But I have a little bit different idea: Maybe another cause is SIMD? Because libjpeg-turbo says it uses SIMD a lot to speed up. It even uses hand-crafted assembly code.

veluca93 commented 2 years ago

@HeroicKatora I have noticed that NEON intrinsics are currently only available on nightly. I can think of 3 options for adding SIMD support on NEON, and I'm not sure which is the best.

What are your thoughts?

fintelia commented 2 years ago

How likely is it that the NOEN intrinsics will be stabilized in their current form? If it is just a matter of waiting for them to reach stable, then I think that's probably the best option. We'd just leave it behind a feature flag until they were available on stable at our MSRV.

What worries more is the testing/maintenance strategy given that the CI and most developers' machines are currently x86. (Not a deal-breaker mind you, just something to figure out so we don't find ourselves scrambling to track down an ARM machine because we need to test an urgent patch or something).

veluca93 commented 2 years ago

As for being stabilized in their current form, considering that they look pretty much identical to their C counterpart (as do the x86 ones), I'd say we're probably fine.

For testing/CI, luckily user-mode QEMU exists :) for libjxl we configured the GitHub CI like this - https://github.com/libjxl/libjxl/blob/main/.github/workflows/build_test.yml#L225 - I assume something similar could be set up for rust, although of course this wouldn't allow benchmarks.

HeroicKatora commented 2 years ago

Regarding NEON, I wonder if it were possible to convince LLVM to auto-vectorize enough and in particular utilizing instructions. We can have specialized methods with explicit target_feature(enable = "…") annotations. For example, consider and compare the instruction output here:

#[target_feature(enable = "avx2")]
pub unsafe fn sum_avx2(vec: &[u32]) -> u32 {
    let mut n = 0u32;
    for &i in vec {
        n = n.wrapping_add(i);
    }
    n
}

pub fn sum_me(vec: &[u32]) -> u32 {
    let mut n = 0u32;
    for &i in vec {
        n = n.wrapping_add(i);
    }
    n
}

playground

HeroicKatora commented 2 years ago

Ah well, looks like the feature names and annotation for ARM aren't stable either. https://rustc.godbolt.org/z/bcr59rTaa Kind of odd that the pure codegen appears to be tied so closely to the stabilization of the arch module intrinsics, isn't it?

veluca93 commented 2 years ago

In my experience, for this kind of code, autovectorization doesn't really bring you far enough...

I'll give it a try in the next few days and we can see where to go from there.

On Wed, 22 Dec 2021, 20:50 HeroicKatora, @.***> wrote:

Ah well, looks like the feature names and annotation for ARM aren't stable either. https://rustc.godbolt.org/z/bcr59rTaa Kind of odd that the pure codegen appears to be tied so closely to the stabilization of the arch module intrinsics, isn't it?

— Reply to this email directly, view it on GitHub https://github.com/image-rs/jpeg-decoder/issues/202#issuecomment-999824506, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOPAI5NMRV7KWJBAV72UJLUSITZ5ANCNFSM5FZ2T4XA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

paolobarbolini commented 2 years ago

Regarding NEON, I wonder if it were possible to convince LLVM to auto-vectorize enough and in particular utilizing instructions.

I'm not a fan of proc macros but it might be worth to at least give it a try via the multiversion crate