rust-lang / libm

A port of MUSL's libm to Rust.
Other
547 stars 97 forks source link

Non-musl `f32` functions? #298

Open jdh8 opened 3 months ago

jdh8 commented 3 months ago

Are you interested in non-musl implementations, especially f32 functions faster on modern architectures? I am writing math functions in Rust from scratch, based on my own research since 2017:
https://github.com/jdh8/metallic-rs

My functions are tested to be faithfully rounded (error < 1 ulp). Unlike most C libraries, I assume that f64 instructions are available even for f32 functions and also try to use FMA if possible. I also keep LUTs to a minimum with WebAssembly in mind.

Benchmarks show that my implementations are faster than musl (libm) and glibc (std on Linux) except for libm::logf https://github.com/jdh8/metallic-rs/actions/runs/10342894582/job/28626374065

tgross35 commented 3 months ago

@jdh8 do you have any specific examples of which specific functions you are talking about? Supplying math symbols that are not exposed in std is more or less a non-goal of our libm.

jdh8 commented 3 months ago

I have published metallic, so its docs are publicly available.
https://docs.rs/metallic/latest/metallic/index.html

For now I have implemented cbrt, exp, exp2, exp10, exp_m1, frexp, hypot, ldexp, ln, ln_1p, log2, log10, round for f32.

Functions not yet in std

exp10

f32::exp10/metallic::f32::exp10
                        time:   [1.4450 ns 1.4468 ns 1.4491 ns]
f32::exp10/libm::exp10f time:   [8.7490 ns 8.7607 ns 8.7763 ns]

frexp

f32::frexp/metallic::f32::frexp
                        time:   [522.51 ps 523.83 ps 525.21 ps]
f32::frexp/libm::frexpf time:   [2.1766 ns 2.1782 ns 2.1802 ns]

ldexp

f32::ldexp/metallic::f32::ldexp
                        time:   [731.73 ps 732.56 ps 733.40 ps]
f32::ldexp/libm::ldexpf time:   [8.5147 ns 8.5303 ns 8.5475 ns]

Examples of functions already in std

cbrt

f32::cbrt/metallic::f32::cbrt
                        time:   [1.2175 ns 1.2204 ns 1.2240 ns]
f32::cbrt/libm::cbrtf   time:   [1.5684 ns 1.5699 ns 1.5717 ns]
f32::cbrt/f32::cbrt     time:   [15.171 ns 15.177 ns 15.184 ns]

exp

f32::exp/metallic::f32::exp
                        time:   [1.4475 ns 1.4538 ns 1.4635 ns]
f32::exp/libm::expf     time:   [9.4960 ns 9.5048 ns 9.5178 ns]
f32::exp/f32::exp       time:   [4.2191 ns 4.2225 ns 4.2269 ns]

hypot

f32::hypot/metallic::f32::hypot
                        time:   [770.30 ps 771.21 ps 772.27 ps]
f32::hypot/libm::hypotf time:   [3.4183 ns 3.4193 ns 3.4204 ns]
f32::hypot/f32::hypot   time:   [3.7229 ns 3.7271 ns 3.7328 ns]

log2

f32::log2/metallic::f32::log2
                        time:   [1.6325 ns 1.6335 ns 1.6347 ns]
f32::log2/libm::log2f   time:   [5.7100 ns 5.7170 ns 5.7267 ns]
f32::log2/f32::log2     time:   [3.2606 ns 3.2647 ns 3.2721 ns]
burrbull commented 3 months ago

Is your crate no_std compatible?

jdh8 commented 3 months ago

I tried but not yet. I use functions from std to access FMA and rounding instructions.

f32::mul_add: https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/src/f32/mod.rs#L64

f32::round_ties_even: https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/src/f32/mod.rs#L204

f64::mul_add: https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/src/f64/mod.rs#L16

f64::round_ties_even: https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/src/f32/mod.rs#L182 https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/src/f32/mod.rs#L239 https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/src/f32/mod.rs#L275

beetrees commented 3 months ago

Which platform are you doing the performance comparison on? Looking at the performance numbers, you've math builtins don't appear to be coming from Rust libm (and are presumably coming from the system libc); it would be good to know which libc your system is using for comparison purposes.

You've stated that the accuracy of all the functions is error < 1 ULP; is this just an upper bound or can this amount of error actually occur for each functions? Using this paper it's possible to compare to the accuracy of the MUSL versions of cbrtf, expf, hypotf, log2f and exp10f (note that Rust libm's expf and log2f implementations are out-of-date in comparison to MUSL upstream); notably all MUSL's functions except for exp10f have a maximum error smaller than 1 ULP (with the caveat that it's impossible to exhaustively test the bivariate hypotf; see the paper for details), with cbrtf being correctly rounded (error <= 0.5 ULP) in all cases.

I ask because both the CORE-MATH project and LLVM libc both provide correctly rounded (error <= 0.5 ULP) implementations of the functions that have competitive performance (a performance comparison with glibc is displayed on the CORE-MATH project website). All else being equal, it would be good to move towards Rust libm having correctly-rounded implementations where possible.

jdh8 commented 3 months ago

I'm using ubuntu-latest, the GitHub Actions runner image. https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/.github/workflows/rust.yml#L23

I just tested if the errors are < 1 ulp. I have not yet computed the maximum error for each function. https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/tests/f32/main.rs#L36

Wow, correctly rounded transcendental functions exist! I dare not dream about this because of the Table Maker's Dilemma.

beetrees commented 3 months ago

Re. the correctly rounded transcendental functions and the table maker's dilemma: this paper goes in to how the dilemma is dealt with. (For single-argument f32 functions in particular, it's possible to do an exhaustive computation using every possible input.)

jdh8 commented 3 months ago

True, exhaustive computation for univariate f32 functions is pretty feasible as I've already tested them exhaustively with their f64 counterparts.

If correct rounding is a goal, I don't think I can make anything better than CORE-MATH. Maybe we should port CORE-MATH or LLVM libc instead? I've read several math functions from CORE-MATH. These functions are already using my assumptions such as calling f64 instructions in f32 functions. (C has floating-point contraction so it does not need special care for FMA.)

beetrees commented 3 months ago

I've also been thinking of porting from CORE-MATH: the implementations seem to be self contained (making it easier to port) and well tested against the GNU MPFR library. FYI per https://github.com/rust-lang/rust/issues/128386#issuecomment-2282249890 I believe @alexpyattaev is currently looking into porting a new implementation of f64 log.

burrbull commented 3 months ago

For note, Rust has good wrappers for MPFR. See rug

jdh8 commented 3 months ago

I gain my confidence back! There is still room for optimization because we can ignore the floating-point environment. I'm upgrading my library to correct rounding.

JSorngard commented 2 months ago

I also have a project that's semi-related to this issue: I have developed an implementation of the Lambert W function: https://crates.io/crates/lambert_w. Would integrating this capability be interesting? It is no_std and has 50 bits of accuracy according to the paper it is based on. It currently only has support for f64, hence "semi-related".

alexpyattaev commented 2 months ago

Thanks, but as far as I know Lambert's W is not in standard library, and this issue is about functions in std. Whether it belongs in std or not is a separate question entirely.