google / highway

Performance-portable, length-agnostic SIMD with runtime dispatch
Apache License 2.0
4.25k stars 322 forks source link

OrderedDemote2To() f64->f32 ? #1903

Open Pflugshaupt opened 11 months ago

Pflugshaupt commented 11 months ago

I'm migrating my DSP codebase from my own attempt of a library to Highway at the moment. Things went mostly well but I found one thing a bit puzzling: I have some algorithms that work on float lanes, but have to do a intermediate calculations at double precision. My own library allowed having double-as-wide f64 aggregates for that, but I see that highway won't do Twice<d> on full-width tags. That's fair enough and so I went with PromoteLowerTo() and PromoteUpperTo() to convert each float tag to two double tags.. However to go back to float later I found OrderedDemote2To() is curiously missing for double to float. Is there a specific reason for that or am I missing some other function? I just want to convert N double lanes to N float lanes using half as many registers - it seems like something that would come up quite often with algorithm requiring full float precision results.

I ended up writing this, but it seems a bit silly:

        auto dbl2float = [](auto d, auto a, auto b) HWY_ATTR {
            const Half<decltype(d)> hd;
            return Combine(d, DemoteTo(hd, b), DemoteTo(hd, a));
        };
jan-wassenberg commented 11 months ago

Hi, we don't have f64->f32 OrderedDemote2To because x86 and SVE can't do that very efficiently and we did not yet have a use-case.

However, RVV and NEON could do this a bit more efficiently. Would you be interested in having a go at adding support? That would involve updating quick_reference.md to mention f64->f32 is supported, in demote_test.cc:678 adding ForShrinkableVectors<TestFloatOrderedDemote2To>()(float());, copying your implementation to generic_ops-inl.h with the usual #if (defined(HWY_NATIVE_ 'include guard', and adding implementations to rvv-inl.h and arm_neon-inl.h.

Pflugshaupt commented 11 months ago

Ok, I'll give it a try once I'm done migrating to Highway and gained some more experience with it. That'll be in January. Thanks for letting me know I'm not missing a different way to do f64->f32. An issue might be that I have zero experience with Risc-V/RVV.

jan-wassenberg commented 11 months ago

Sounds good :) No worries, RVV already has an existing function for that, it may be enough simply to enable f64->f32 in the template SFINAE. Would also be fine to write a TODO instead, in the meantime that target would be covered by the generic code.