halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.77k stars 1.07k forks source link

We should make a cleanly-vectorizing fast-approximation for atan2f. #8243

Open mcourteaux opened 1 month ago

mcourteaux commented 1 month ago

This article seems amazing reference:

https://mazzo.li/posts/vectorized-atan2.html

You may assign me, I think I'll do it. I think I'm seeing bad performance due to 8 calls to glibc's atan2f, instead of something that vectorizes cleanly.

mcourteaux commented 1 month ago

Or this one, indeed: https://github.com/boulos/syrah/blob/4ac08d54daa09fc4e7ac8424898d21deda18e103/src/include/syrah/FixedVectorMath.h#L288-L348

steven-johnson commented 1 month ago

Tagging zvookin because he's looked into doing this for some other similar cases (eg tanh)