Open jdh8 opened 3 months ago
@jdh8 do you have any specific examples of which specific functions you are talking about? Supplying math symbols that are not exposed in std
is more or less a non-goal of our libm
.
I have published metallic, so its docs are publicly available.
https://docs.rs/metallic/latest/metallic/index.html
For now I have implemented cbrt
, exp
, exp2
, exp10
, exp_m1
, frexp
, hypot
, ldexp
, ln
, ln_1p
, log2
, log10
, round
for f32
.
std
exp10
f32::exp10/metallic::f32::exp10
time: [1.4450 ns 1.4468 ns 1.4491 ns]
f32::exp10/libm::exp10f time: [8.7490 ns 8.7607 ns 8.7763 ns]
frexp
f32::frexp/metallic::f32::frexp
time: [522.51 ps 523.83 ps 525.21 ps]
f32::frexp/libm::frexpf time: [2.1766 ns 2.1782 ns 2.1802 ns]
ldexp
f32::ldexp/metallic::f32::ldexp
time: [731.73 ps 732.56 ps 733.40 ps]
f32::ldexp/libm::ldexpf time: [8.5147 ns 8.5303 ns 8.5475 ns]
std
cbrt
f32::cbrt/metallic::f32::cbrt
time: [1.2175 ns 1.2204 ns 1.2240 ns]
f32::cbrt/libm::cbrtf time: [1.5684 ns 1.5699 ns 1.5717 ns]
f32::cbrt/f32::cbrt time: [15.171 ns 15.177 ns 15.184 ns]
exp
f32::exp/metallic::f32::exp
time: [1.4475 ns 1.4538 ns 1.4635 ns]
f32::exp/libm::expf time: [9.4960 ns 9.5048 ns 9.5178 ns]
f32::exp/f32::exp time: [4.2191 ns 4.2225 ns 4.2269 ns]
hypot
f32::hypot/metallic::f32::hypot
time: [770.30 ps 771.21 ps 772.27 ps]
f32::hypot/libm::hypotf time: [3.4183 ns 3.4193 ns 3.4204 ns]
f32::hypot/f32::hypot time: [3.7229 ns 3.7271 ns 3.7328 ns]
log2
f32::log2/metallic::f32::log2
time: [1.6325 ns 1.6335 ns 1.6347 ns]
f32::log2/libm::log2f time: [5.7100 ns 5.7170 ns 5.7267 ns]
f32::log2/f32::log2 time: [3.2606 ns 3.2647 ns 3.2721 ns]
Is your crate no_std
compatible?
I tried but not yet. I use functions from std
to access FMA and rounding instructions.
f32::mul_add
:
https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/src/f32/mod.rs#L64
f32::round_ties_even
:
https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/src/f32/mod.rs#L204
f64::mul_add
:
https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/src/f64/mod.rs#L16
f64::round_ties_even
:
https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/src/f32/mod.rs#L182
https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/src/f32/mod.rs#L239
https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/src/f32/mod.rs#L275
Which platform are you doing the performance comparison on? Looking at the performance numbers, you've math builtins don't appear to be coming from Rust libm
(and are presumably coming from the system libc
); it would be good to know which libc
your system is using for comparison purposes.
You've stated that the accuracy of all the functions is error < 1 ULP; is this just an upper bound or can this amount of error actually occur for each functions? Using this paper it's possible to compare to the accuracy of the MUSL versions of cbrtf
, expf
, hypotf
, log2f
and exp10f
(note that Rust libm's expf
and log2f
implementations are out-of-date in comparison to MUSL upstream); notably all MUSL's functions except for exp10f
have a maximum error smaller than 1 ULP (with the caveat that it's impossible to exhaustively test the bivariate hypotf
; see the paper for details), with cbrtf
being correctly rounded (error <= 0.5 ULP) in all cases.
I ask because both the CORE-MATH project and LLVM libc both provide correctly rounded (error <= 0.5 ULP) implementations of the functions that have competitive performance (a performance comparison with glibc is displayed on the CORE-MATH project website). All else being equal, it would be good to move towards Rust libm having correctly-rounded implementations where possible.
I'm using ubuntu-latest
, the GitHub Actions runner image.
https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/.github/workflows/rust.yml#L23
I just tested if the errors are < 1 ulp. I have not yet computed the maximum error for each function. https://github.com/jdh8/metallic-rs/blob/f8d05271b187acfe3c486be9e495c261c63739e6/tests/f32/main.rs#L36
Wow, correctly rounded transcendental functions exist! I dare not dream about this because of the Table Maker's Dilemma.
Re. the correctly rounded transcendental functions and the table maker's dilemma: this paper goes in to how the dilemma is dealt with. (For single-argument f32
functions in particular, it's possible to do an exhaustive computation using every possible input.)
True, exhaustive computation for univariate f32
functions is pretty feasible as I've already tested them exhaustively with their f64
counterparts.
If correct rounding is a goal, I don't think I can make anything better than CORE-MATH. Maybe we should port CORE-MATH or LLVM libc instead? I've read several math functions from CORE-MATH. These functions are already using my assumptions such as calling f64
instructions in f32
functions. (C has floating-point contraction so it does not need special care for FMA.)
I've also been thinking of porting from CORE-MATH: the implementations seem to be self contained (making it easier to port) and well tested against the GNU MPFR library. FYI per https://github.com/rust-lang/rust/issues/128386#issuecomment-2282249890 I believe @alexpyattaev is currently looking into porting a new implementation of f64
log
.
I gain my confidence back! There is still room for optimization because we can ignore the floating-point environment. I'm upgrading my library to correct rounding.
I also have a project that's semi-related to this issue: I have developed an implementation of the Lambert W function: https://crates.io/crates/lambert_w. Would integrating this capability be interesting? It is no_std
and has 50 bits of accuracy according to the paper it is based on. It currently only has support for f64
, hence "semi-related".
Thanks, but as far as I know Lambert's W is not in standard library, and this issue is about functions in std. Whether it belongs in std or not is a separate question entirely.
Are you interested in non-musl implementations, especially
f32
functions faster on modern architectures? I am writing math functions in Rust from scratch, based on my own research since 2017:https://github.com/jdh8/metallic-rs
My functions are tested to be faithfully rounded (error < 1 ulp). Unlike most C libraries, I assume that
f64
instructions are available even forf32
functions and also try to use FMA if possible. I also keep LUTs to a minimum with WebAssembly in mind.Benchmarks show that my implementations are faster than musl (libm) and glibc (std on Linux) except for
libm::logf
https://github.com/jdh8/metallic-rs/actions/runs/10342894582/job/28626374065