rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.91k stars 12.68k forks source link

`f64::log` precision change in 1.82 nightly #128386

Closed ogoffart closed 1 month ago

ogoffart commented 2 months ago

Our CI started to report an error today with nightly on Linux (other platforms are fine). Simplified the test case is because of a small difference in the computation of log.

(I know it's bad practice to do equality tests on float, but I thought I'd report this behavior change anyway as i don't know if this was intentional).

Code

I tried this code:

fn main() {
    println!("Hello, world! {}", 9f64.log(3.) );
    assert_eq!(9f64.log(3.) , 2.)
}

I expected to see this happen:

Hello, world! 2 and no panics. As this happens in stable

Instead, this happened:

Hello, world! 2.0000000000000004
thread 'main' panicked at src/main.rs:3:5:
assertion `left == right` failed
  left: 2.0000000000000004
 right: 2.0

Version with regression

rustc 1.82.0-nightly (612a33f20 2024-07-29) (Linux only)

apiraino commented 2 months ago

Bisection seems to indicate this comes after updating out compiler-builtins crate: 80d8270d8488957f62fbf0df7a19dfe596be92ac

I guess this is "expected"? cc: @nicholasbishop

tgross35 commented 2 months ago

This is probably a result of the linkage change in https://github.com/rust-lang/compiler-builtins/pull/594 since the actual implementation hasn't changed in 5 years https://github.com/rust-lang/libm/blob/279e5f6abe0a2ca9066962d9ec894f0df1f417ac/src/math/log.rs..

Could you dwarfdump the final binaries and take a look whether system or compiler_builtins symbols are being linked?

tgross35 commented 2 months ago

Obviously we want to have the correct values and will try to make things work, thanks bringing it up. Just note that the docs are extremely lenient here:

The precision of this function is non-deterministic. This means it varies by platform, Rust version, and can even differ within the same execution from one invocation to the next.

I think that if you are relying on exact results for CI then that may prove to be pretty platform-dependent.

adrian17 commented 2 months ago

FWIW, we just hit it in our CI too; looks like the runtime value no longer equals the compiler constant-folded result on our platforms. Not sure how we'll handle it yet.

(EDIT: forgot to add: we noticed this with exp instead of log, but I'm assuming it's caused by the same change)

saethlin commented 2 months ago

looks like the runtime value no longer equals the compiler constant-folded result on our platforms.

I believe that is https://github.com/rust-lang/rust/issues/124364

I'm deliberately not labeling this issue as I-unsound because I think this issue is just noticing a specific instance of the above-linked issue.

alexpyattaev commented 2 months ago

Arguably, this is a true regression, and not a simple "log is fuzzy anyway". Reason being that the resulting error is 2 * f64::EPSILON, and not one ulp as advertised in the implementation of the log function.

FYI dwarfdump is attached. As far as I could read it, it is using the compiler's version of log function. dwarfdump.txt.gz

Also interesting that in release build there is no issue.

tgross35 commented 2 months ago

Thanks for posting that, relevant bit of the above:

0x000e7fbb:         DW_TAG_subprogram
                      DW_AT_low_pc      (0x0000000000058ff0)
                      DW_AT_high_pc     (0x0000000000058ffd)
                      DW_AT_frame_base  (DW_OP_reg6 RBP)
                      DW_AT_name        ("log")
                      DW_AT_decl_file   ("/rust/deps/compiler_builtins-0.1.114/src/macros.rs")
                      DW_AT_decl_line   (468)
                      DW_AT_external    (true)

Just to make sure, could you do the same with an older nightly and check that the system version was getting linked before?

alexpyattaev commented 2 months ago

I do not have a previous nightly left, but 1.81.0-beta.2 that does not exhibit the bug produces the following

dwarfdump.txt.gz

The specific section for log function that you refer to is not there.

Amanieu commented 2 months ago

I looked into it and it seems that the log function in compiler_builtins is turned into a strong symbol and libm.so is therefore not linked to.

Since this isn't the intended behavior, we probably have to revert https://github.com/rust-lang/compiler-builtins/pull/594 and selectively only provide math symbols on targets that don't have a system libm.

alexpyattaev commented 2 months ago

Is system libm a good idea? It is rather unlikely that it would have support for recent CPUs in distros like debian... Not sure it would matter though (as log does not seem to be simd-friendly anyway).

Amanieu commented 2 months ago

An alternative would be to just fix our log implementation: it is different from the latest version in musl, so maybe updating it would fix this issue.

RalfJung commented 2 months ago

Arguably, this is a true regression, and not a simple "log is fuzzy anyway". Reason being that the resulting error is 2 * f64::EPSILON, and not one ulp as advertised in the implementation of the log function.

Where is this advertised? Certainly not in our documentation (to the contrary, the documentation explicitly says that the precision is not guaranteed). Comments in the implementation are not stable guarantees.

So no, there's no regression here in the formal sense, though getting higher precision could still be nice.

alexpyattaev commented 2 months ago

Well I think it is not about accuracy but about consistency. The calculation may not be accurate as such, but it should at least be consistent, i.e. you should not be getting different answers depending on the mood of the optimizer. And if there is an error range that is allowed, it should be mapped out and announced to the user. If no error bound is guaranteed, this function could just as well always return 42.

RalfJung commented 2 months ago

We explicitly document that the optimizer can affect results of these functions. You may not be happy about that, but there's no bug here. If you want to change the specification for these operations to require them to be deterministic, that's a feature request that should go through our usual process -- like exploring what it would actually take to achieve that (AFAIK this is very non-trivial and might even require LLVM changes), and writing up the rationale and the arguments against and in favor in a pre-RFC or a t-libs ACP or something like that. And yes there are arguments against this; achieving this will likely require disabling some optimizations and hence cost performance. I don't have the expertise to help explore the design space here; you may find folks that can help with this on https://internals.rust-lang.org/ or on Zulip.

If no error bound is guaranteed, this function could just as well always return 42.

Please stop using strawman arguments, that is not constructive.

Rust has many places where we don't make hard guarantees for various reasons, but we still aim to provide reasonable outcomes and would consider it a bug if we failed to do so. The line between "reasonable" and "unreasonable" is fuzzy and subjective. Obviously, a log function that ignores its input would not be reasonable and would be considered a bug.

Amanieu commented 2 months ago

To anyone interested in concretely fixing this, consider submitting a PR to port the new log implementation from musl to our implementation in the libm crate, which is used by compiler-builtins.

Also it would be good to check if any other math functions in libm are similarly outdated compared to the latest implementation in musl.

alexpyattaev commented 2 months ago

I can look into porting the musl code at some point, could be fun to mess with some bits. Are there any guidelines specific to that kind of code? I imagine using platform-specific SIMD is a big no-no there, right?

beetrees commented 2 months ago

If a new log implementation is going to be ported, it may as well be one that gives correctly rounded results in all cases. Looking at this paper, MUSL's implementation doesn't provide correctly rounded results in all cases, whereas it appears that LLVM libc has such an implementation for both f32 and f64; the CORE-MATH project also has a correctly rounded implementation of log (f32, f64). While Rust doesn't require a correctly rounded implementation of these functions, there's no reason not to use one when there's one available, all else being equal.

beetrees commented 2 months ago

Regarding platform-specific SIMD: AFAIK it should be fine as long as there is an equivalent non-SIMD implementation for other platforms and for when compiling with the force-soft-floats feature. Basic arithmetic operations (+-*/) are fine regardless.

beetrees commented 2 months ago

@rustbot label +A-floating-point

alexpyattaev commented 2 months ago

Ok I see. I'll try to dig into this properly at some point in the next week or two, will post here if I've given up.

jdh8 commented 2 months ago

Both f32::log and f64::log simply call their ln twice and divide. This will introduce another rounding error even if ln were correctly rounded. If we want to have a log whose precision comparable to other functions, we have to internally make a kernel function exceeding f64 precision and then use it across logarithmic functions.

See also my implementation of f32::log (only faithful rounding (error < 1 ulp) so far. I'm considering upgrading to correct rounding (error ≤ 0.5 ulp)) https://github.com/jdh8/metallic-rs/blob/62b9c3e670e487b6c1aea9947bff0b2ec29af564/src/f32/mod.rs#L456

tgross35 commented 1 month ago

There is ongoing work to improve our libm implementations, but we are not there yet. Since we plan to do better but it just isn't ready yet, and since we are close to the beta branch (<5 days), I think the best thing to do is just revert this for now and add it back once our implementations are more ready.

@Amjad50 could you post a revert of https://github.com/rust-lang/compiler-builtins/pull/594 and https://github.com/rust-lang/compiler-builtins/pull/577 as applicable, and open an issue (in builtins) to reapply it?

Amjad50 commented 1 month ago

Yah, I think I can just revert to the old expected behavior for windows/Mac/Linux platforms and only link include the math module for other targets.

I'll do it later today.

But I'm a bit confused as to why this happened, since those PRs are making all the symbols 'weak', not sure how they turned 'strong', but in that sense it should have collided with the system math library.

tgross35 commented 1 month ago

I don't think it was making the symbols weak that caused the problem, instead that we made them available at all. rust-lang/rust didn't get an updated compiler-builtins between when those PRs merged and about a month ago, IIRC.

They aren't getting turned strong, the system math symbols are just also weak (referencing /usr/lib/x86_64-linux-gnu/libm-2.39.a). I'm a bit fuzzy on the actual story is, but this is my understanding from digging before: the linker is allowed to choose any weak symbol when there are >1. Typically, it picks the first symbol it sees, which is usually the first library in its command invocation to provide the symbol. We pass builtins to the linker first, so unless the system symbol is strong we wind up using compiler-builtins by default.

The precision effects here are one concern, but another concern I have is that our routines may not be optimized as well as what is available from system libm. So I think it worth some effort to ensure that (which is in progress) before we effectively make our symbols be used by default everywhere, or otherwise change things so our symbols are truly a fallback rather than more or less the default.

Thanks for being willing to pick this up. I agree that we only need to change back on platforms that provide their own symbols.