Open Pflugshaupt opened 11 months ago
Hi, we don't have f64->f32 OrderedDemote2To because x86 and SVE can't do that very efficiently and we did not yet have a use-case.
However, RVV and NEON could do this a bit more efficiently. Would you be interested in having a go at adding support? That would involve updating quick_reference.md to mention f64->f32 is supported, in demote_test.cc:678 adding ForShrinkableVectors<TestFloatOrderedDemote2To>()(float());
, copying your implementation to generic_ops-inl.h with the usual #if (defined(HWY_NATIVE_
'include guard', and adding implementations to rvv-inl.h and arm_neon-inl.h.
Ok, I'll give it a try once I'm done migrating to Highway and gained some more experience with it. That'll be in January. Thanks for letting me know I'm not missing a different way to do f64->f32. An issue might be that I have zero experience with Risc-V/RVV.
Sounds good :) No worries, RVV already has an existing function for that, it may be enough simply to enable f64->f32 in the template SFINAE. Would also be fine to write a TODO instead, in the meantime that target would be covered by the generic code.
I'm migrating my DSP codebase from my own attempt of a library to Highway at the moment. Things went mostly well but I found one thing a bit puzzling: I have some algorithms that work on float lanes, but have to do a intermediate calculations at double precision. My own library allowed having double-as-wide f64 aggregates for that, but I see that highway won't do
Twice<d>
on full-width tags. That's fair enough and so I went with PromoteLowerTo() and PromoteUpperTo() to convert eachfloat
tag to twodouble
tags.. However to go back to float later I found OrderedDemote2To() is curiously missing fordouble
tofloat
. Is there a specific reason for that or am I missing some other function? I just want to convert N double lanes to N float lanes using half as many registers - it seems like something that would come up quite often with algorithm requiring full float precision results.I ended up writing this, but it seems a bit silly: