Open bnprks opened 1 year ago
Hi @bnprks , thank you for reaching out. This sounds very interesting and we are very happy to collaborate. Translating some of the math functions from SLEEF sounds like a great approach. FYI here's a list of the ones we don't have implemented yet, indeed Pow() is still missing.
Numpy is also interested in accurate and robust implentations. I'd suggest favoring both high precision, and correctly handling edge cases. For pow, their relative cost would be low, and we can provide some compile-time template argument or #if
to opt out of them for anyone who wants.
If scalar is easier to translate for you, sure, why not. Do you see any potential concerns or tradeoffs there? (I know the many suffixes in their vector code are hard to read.)
There is currently no active effort to extend our math functions. As mentioned, it may be interesting for numpy in future.
Happy to discuss further. FYI I am out of office next week.
Thanks for the reply! I've been doing some exploratory work looking into the Pow function and the potential for a larger automated translation of the SLEEF math functions.
What I've done so far is:
xpowf
Translation seems tricky but possible, and I am excited about the prospect of having Sleef's high quality math implementations available in Highway.
If scalar is easier to translate for you, sure, why not. Do you see any potential concerns or tradeoffs there? (I know the many suffixes in their vector code are hard to read.)
I've been looking both at the simd and scalar sources some, and I think this is a key discussion point for adapting the sleef algorithms, especially if an automated translation might be desired.
Here's my take:
xpowf
functionAfter more review, I think I will try to translate starting from the simd source, due to the presence of simd-specific efficiency tricks. My approach will be to make semantic translations between sleef's simd abstraction layer into highway's functions which will capture some but not all of the simd optimizations in sleef.
vtruncate_vi2_vf(v)
would translate to ConvertTo(di, v)
, even though Highway's version for avx512 and other x86 targets includes additional overflow-handling logic that sleef's avx12 doesn't perform.Assuming this approach continues to make sense to you, I'll start by trying to develop a semi-automated translation system to translate the Pow function, then see from there if it seems possible to expand to the rest of the sleef math functions.
Great to hear your progress. This sounds very exciting indeed! Starting from SIMD makes sense - although harder to read, there might be other tricks in there as well, for example vpternlog.
Interesting that SLEEF does not check/fix the return value of _mm512_cvttps_epi32. I'd consider its 0x80..00 to be less useful than INT*_MAX. The cost of potential bugs/misbehavior seems much higher to me than a few extra instructions in already expensive math routines.
Please don't hesitate to let us know if you'd like to use additional intrinsics. We are happy to add ops wherever something reasonable can be done for other platforms.
Assuming this approach continues to make sense to you, I'll start by trying to develop a semi-automated translation system to translate the Pow function, then see from there if it seems possible to expand to the rest of the sleef math functions.
Sounds great!
Hi @bnprks , I am curious how it's going with the Pow function?
I haven't done much with this lately, so I'm still at the stage of viable-seeming strategy but definitely not something working. I solved my immediate use-case by disabling SIMD for Pow (wasn't that critical), but I might take a second look at this since I think the sleef translation is an interesting and useful problem. Judging by my slow progress to date, I wouldn't expect much to come of this soon.
If you're interested in cleaning up outstanding issues we can close this and I'll reopen if I get around to porting a full Pow implementation.
If you or someone else wants to tackle this instead definitely don't wait for me. I'd be happy to share notes + the code snippets I was working on for parsing + transforming the sleef source code if desired.
Thanks for the update. I agree this would be useful and interesting. No worries/rush, we can keep this open :)
Hi @jan-wassenberg, I've finally gotten time to take a proper look at this over the past few weeks, and have made some good initial results!
I've started off by making a separate repository to handle the translation code, results, and testing which is available at bnprks/highway-sleef.
Most of the details are in the README there, but for a quick summary:
ScalableTag
is needed)math-inl.h
with the Sleef translationsI've included a lot of accuracy + performance results in the repo, but my high-level summary would be that the SLEEF functions tend to provide more accurate options than Hwy at the cost of slower execution speed. When similar-precision options are available, SLEEF and Hwy are within ~25% speed of each other, but high-precision SLEEF options can become much slower (3x baseline, but some outliers at 10-30x slower).
Wow, looks like the automation has paid off in that you've been able to port a long list of functions. Nice work!
1k-10k random log-spaced inputs, along with all combinations of per-function "special values"
Sounds good. We can use AdjustedReps() to reduce the number of iterations in debug builds, then it should be fine.
if testing on vector types other than ScalableTag is needed
Testing on partial vectors can expose some bugs. I'm not sure we have to run for all powers of two, but ForGEVectors<128, Test>
might be a good compromise. Our Github and internal CI takes care of running on various platforms.
Deciding in which cases (if any) it's worth replacing an existing function in math-inl.h with the Sleef translations
I'd trust your benchmarks and suggest the general principle that we replace any existing functions with something more exact, and no more than 10% slower. For example Exp(). For the higher-precision ones, which some users have indeed requested regardless of cost, we can put them perhaps in another file and with a name indicating the tradeoff.
On licensing, are you in contact with Prof. Shibata? If they could re-license to BSD3 or Apache2, that would be very helpful.
Wow, looks like the automation has paid off in that you've been able to port a long list of functions. Nice work!
Thanks! Aside from special cases like SinCos returning two arguments, the incremental work is basically 1 line of config per Sleef function that needs translating (including SIMD ops and intermediate functions)
I'll try getting a faster testing setup established, though testing platforms though personally I'll probably only be able to check Intel AVX2 and below and ARM NEON with the computers I have access to. I might look into qemu testing later.
On licensing, are you in contact with Prof. Shibata? If they could re-license to BSD3 or Apache2, that would be very helpful.
I have not been in contact with Prof. Shibata. Do you think a github issue on sleef or an email to Prof. Shibata would be best?
It looks like the files used for translation have ~10 distinct contributors, though most besides Prof. Shibata's to do not touch the actual function implementations. Prof. Shibata has the vast majority of contributions, and the most active current maintainer on github appears to be Pierre Blanchard working at ARM in Manchester, UK.
EDIT: To add one question -- what would your goals/constraints be on licensing? Even if Sleef added e.g. BSD3 licensing, that would still presumably result in the derived math functions requiring special licensing treatment compared to the rest of Highway being dual-licensed.
the incremental work is basically 1 line of config
Nice and efficient :)
I'll probably only be able to check Intel AVX2 and below and ARM NEON with the computers I have access to.
Sure, that's fine.
Do you think a github issue on sleef or an email to Prof. Shibata would be best?
I figure email is better if you have it.
On licensing, it would be nice if the ported file at least matches one of the existing Highway licenses, so that users of that file would just choose that one. Does that make sense?
Since it's been a week thought I'd post a quick update -- I'm doing a bit more work on expanding the translations, particularly starting to tackle double-precision functions and checking non-AVX2 instruction sets. A couple new hiccups needed to be addressed, but so far nothing that seems like a real blocker. When I'm satisfied my translations are correct and cover most of Sleef's functions then we can discuss more concretely how to upstream functionality (and which parts).
I've copied a few more licensing thoughts below, but probably nothing too useful to discuss unless right now unless you happen to have knowledge of Google's legal policies regarding the Boost license used by Sleef (namely what code counts as a "derivative work" of another author's code)
Thanks for sharing. On licensing, I understand the difficulty of asking all contributors. We have done this in the past, but it takes some time. Under these circumstances, I think a separate file with separate (Boost) license is reasonable.
Hi, I'm considering implementing Pow function pull request for highway and wanted to check scope/goals before jumping in too much further.
My rough plan would be:
I know a mechanical translation of SLEEF's functions has been discussed in this numpy issue, but I would be proposing a manual translation here, though sticking to the high-level outline of the SLEEF code.
Questions:
3^-Inf = 0
,0.4^Inf = 0
, and-2^3.5=NaN
. Should all those be handled as listed, or should some return garbage in order to eliminate a few checks?