rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.91k stars 12.52k forks source link

Disabling particular unsafe precondition checks under `-Cdebug-assertions=y` #120969

Open ojeda opened 7 months ago

ojeda commented 7 months ago

From https://github.com/rust-lang/rust/issues/85122#issuecomment-1938599162 and https://github.com/rust-lang/rust/issues/85122#issuecomment-1938688741.

After https://github.com/rust-lang/rust/pull/120594 (1.78), unsafe precondition checks always apply under -Cdebug-assertions=y, but there is currently no way to disable particular cases in debug builds:

They will make things too slow for somebody sooner or later -- it is not uncommon in projects to have a "fast debug" build where a few checks need to be explicitly bypassed in hot paths. And it is not just the branch/call itself, it can also break other optimizations like vectorization: https://godbolt.org/z/MMTordabn.

I really like the unsafe precondition checks, but I would like to make sure there is an escape hatch for -Cdebug-assertions=y builds. Perhaps we need a core::hint::unreachable_unchecked_even_in_debug-like (even if it does not solve the "there is no checked API" case, but I don't know if that is an issue). For me, core::hint::unreachable_unchecked was that in the past, and things like unwrap_unchecked() were the "checked in debug" ones.

I also just noticed that using the intrinsic for slice indexing is not enough to remove the check in a debug build and get vectorization back (included in the link).

Cc https://github.com/rust-lang/rust/issues/120848. Cc @saethlin.

ojeda commented 7 months ago

I started typing out a long reply, but I'll wait for a new issue. For now, please check https://github.com/rust-lang/rust/issues/120848, I've already thought of quite a few follow-up tasks on these checks and wouldn't mind adding to the list.

Thanks Ben -- I see you want things like get_unchecked to use intrinsics, which would solve (I guess) the slice indexing bit I mention above, right?

clarfonthey commented 7 months ago

I think that it's very fair to separate out unsafe preconditions from standard debug assertions, since these are operations where the developer is stating they've already met the preconditions and the extra checks are only there for, well, cases where they missed up.

Perhaps a separate unsafe_preconditions flag could be added and the assertions could be exposed to end users who wish to add them to their own crates? The default would be to enable them alongside debug assertions, but they could be enabled separately. Perhaps they could even be enabled with debug assertions off, since in some ways, the two checks are usually redundant.

Another important thing to consider here is that methods on top of the intrinsics using these checks are done intentionally so that less logic actually has to be included in the intrinsics themselves. From this perspective, it's actually less desirable to push people to use the intrinsics directly, since those are unstable whereas the other methods are intended to eventually become stable.

the8472 commented 7 months ago

Afaik debug-asserts works on a per-crate level. So one option is to move the hot paths into a separate crate and applying package overrides to it when building the binary.

Generally I wouldn't expect vectorization to work with debug asserts. Practically no effort is made to make this work. E.g. none of the codegen tests run when std debug asserts are enabled.

saethlin commented 7 months ago

Afaik debug-asserts works on a per-crate level. So one option is to move the hot paths into a separate crate and applying package overrides to it when building the binary.

@the8472 These checks specifically do not work that way. Please see my PR that implemented the system they use: https://github.com/rust-lang/rust/pull/120594

the8472 commented 7 months ago

Hrrm, that seems problematic because it renders the usual advice to do that kind of splitting ineffective.

saethlin commented 7 months ago

Thanks Ben -- I see you want things like get_unchecked to use intrinsics, which would solve (I guess) the slice indexing bit I mention above, right?

No. My wish/goal is that if a user of the standard library calls any function in a manner which is locally invalid (i.e. the kinds of things ubsan finds, not asan/msan/tsan), that erroneous call is detected at runtime, by the default cargo run/cargo test. I mentioned using an intrinsic so that we can have a single check, and a helpful error message from get_unchecked instead of catching the problem in a callee, which does prevent the UB but has a generic and less-helpful message.

and get vectorization back

Since you were getting vectorization before, are you compiling with -Copt-level=1 in your debug builds? If you are, I think your use case is better satisfied by one of these items in the issue I linked:

The actual checks are hidden behind #[inline(never)] to prevent monomorphizing them many times, but that attribute also means LLVM cannot see what is being checked and optimize on it. Try changing the monomorphic check function from #[inline(never)] to some new attribute that makes them inlinable by LLVM, but not by the MIR inliner. Perhaps we call this #[inline(only_post_mono)]?

Try to deduplicate checks in a MIR pass or codegen. It's possible that after GVN we end up with MIR where a call dominates another call with the same argument(s).

Or we could also toggle some of the checks off at -Copt-level=1.

ojeda commented 7 months ago

My wish/goal is that if a user of the standard library calls any function in a manner which is locally invalid (...) that erroneous call is detected at runtime, (...) I mentioned using an intrinsic so that we can have a single check

I was not talking about avoiding the check completely. What I meant is that, from the quick test I did (in the CE link) with get + intrinsics, at least one check is not getting removed, which is what I guessed broke vectorization. I was hoping it was related to the duplicated checks somehow, and therefore that by removing them (i.e. simplifying) the compiler would be able to vectorize again. But from what are saying, the checks are done in a way that they are not currently inlinable, which sounds like a more likely reason. :)

Since you were getting vectorization before, are you compiling with -Copt-level=1 in your debug builds? If you are, I think your use case is better satisfied by one of these items in the issue I linked:

Please see the CE link I mentioned in OP -- it uses -O. It was just an example of projects with "fast debug" builds (or "release with asserts"), which may be compiled with -Copt-level=2 or even -Copt-level=3, and where a decrease of performance may make the debug build unusable for certain use cases because there may be some real-time constraints. In those projects, you can typically keep the vast majority of checks enabled, but in the hot parts of the code you may need to avoid a few of them.

That is why keeping the ability to selectively remove the checks in certain cases for -Cdebug-assertions=y builds is important (i.e. you don't want to remove all the checks either), and thus the core::hint::unreachable_unchecked_even_in_debug-like ideas.

ojeda commented 7 months ago

The actual checks are hidden behind #[inline(never)] to prevent monomorphizing them many times, but that attribute also means LLVM cannot see what is being checked and optimize on it. Try changing the monomorphic check function from #[inline(never)] to some new attribute that makes them inlinable by LLVM, but not by the MIR inliner. Perhaps we call this #[inline(only_post_mono)]?

Try to deduplicate checks in a MIR pass or codegen. It's possible that after GVN we end up with MIR where a call dominates another call with the same argument(s).

From what I can tell, you would still do the checks even with those items done, right? i.e. you are just removing the duplicated checks and making them inlinable etc., but the check would still need to be emitted in cases it cannot be proven to not happen. So since the checks would still be there in those cases, it is likely not enough, unless combined with a core::hint::unreachable_unchecked_even_in_debug or similar.

Or we could also toggle some of the checks off at -Copt-level=1.

Ideally, the users would pick what they need to disable, ideally per-call (i.e. not even per check, but per instance of the check, e.g. I may want 99% of the get_unchecked calls to be checked, but not particular ones in the hot parts). That is why I mentioned the core::hint::unreachable_unchecked_even_in_debug (which would be like the intrinsic) or similar.

saethlin commented 7 months ago

From what I can tell, you would still do the checks even with those items done, right?

Currently, all of the checks regardless of complexity are hidden behind a #[inline(never)] function call. I'm going to move the slice bounds check condition out of the function very soon because my benchmarking indicates that this doesn't have a detrimental effect on compile time.

Of course that doesn't get vectorization in your example code, but it should help on larger programs.

Ideally, the users would pick what they need to disable, ideally per-call

I don't currently know how to implement this. I am very wary of increasing the library maintenance burden or adding language features in the form of in-source annotations to support this. Maybe someone else who knows the library or compiler better sees how this could be done.

the8472 commented 7 months ago

I think in-source annotations would be of limited use since those checks can be buried behind several layers of abstraction in a library. Iterators are an obvious example. Unless they applied to the call-tree, but we don't have many of those.

RalfJung commented 7 months ago

I think Miri is running into this. We are running CI with debug assertions + optimizations (since I want to catch e.g. integer overflows). When https://github.com/rust-lang/rust/pull/120594 landed, our overall CI times increased by 50%.

saethlin commented 7 months ago

Some quick profiling indicates that ./miri test currently spends ~31% of the time in the interpreter executing the slice::from_raw_parts precondition check (and no time in others). I suspect most of that is from Vec's Deref impl.

RalfJung commented 7 months ago

That seems to align pretty well with the 50% slowdown -- the old execution time is around 66% of the current execution time, so if you can shave off 31% that brings it back very close to where we started.

RalfJung commented 7 months ago

Given the heavy perf impact, I am going to mark this as a regression.

Noratrieb commented 7 months ago

I think it's reasonable to make less of the standard library (like vec::deref) use the checked from_raw_parts

RalfJung commented 7 months ago

I think it would also make sense to split this flag from the regular debug_assertions, similar to how overflow checks are already controlled by a separate flag.

Noratrieb commented 7 months ago

I guess most of the current negative perf impact should go away once the check is able to be inlined. But using the extra-ub-checks flag or however we wanted to expose it for this makes sense.

RalfJung commented 7 months ago

I was more thinking of cfg(ub_checks), defaults to the same value as cfg(debug_assertions) but can be set separately if desired.

saethlin commented 7 months ago

This issue is about being able to disable a specific instantiation of a specific check, so I do not think globally-applied flags or globally-set cfgs are relevant.

RalfJung commented 7 months ago

I was interpreting this as a more general issue for how to deal with the perf issues around these precondition checks. Giving the author of the check more control is just one way to achieve that. But we can make that a separate issue :shrug:

Opened https://github.com/rust-lang/rust/issues/121245.

RalfJung commented 7 months ago

I've sometimes wanted a way to enable/disable arithmetic overflow checks in a particular region of the code. This issue (the original description) sounds similar. But such a mechanism seems pretty tricky to design so I don't think we are anywhere close to having something like that.

asquared31415 commented 7 months ago

data point: C# has checked and unchecked blocks (that look syntactically like Rust's unsafe blocks): https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/statements/checked-and-unchecked

These control the behavior of integer overflow at a block-scoped level. I don't think that such a design is particularly useful for configuring extra UB checks, but it may be something to think about.

saethlin commented 7 months ago

The obstacle here is not coming up with speculative designs for such a thing, but implementing any of them. I'm sure there will be an incredible amount of bikeshedding, but maybe we can at least postone that.

The nearest design to this that I can think of is #[rustc_inherit_overflow_checks], but crucially that is handled in MIR building, where "which function did this Rvalue come from" is obvious. But Rvalue::DebugAssertions is lowered after MIR inlining and other such optimizations have happened.

saethlin commented 3 weeks ago

In https://github.com/rust-lang/rust/pull/129498 I'm experimenting with ways to get the precondition checks for ptr::{read,write} enabled. One of the things I'm trying is inventing an attribute currently called #[rustc_no_ubchecks] which makes the MIR optimization pipeline delete checks that got inlined into the body of the indicated function. I don't think this idea is particularly good or possible to stabilize, but it is easy to implement. If it lands, it would be a thing for RFL to play around with, just in case you happen to find a use for it.

This could probably be easily extended to (again) a crude general mechanism by putting the attribute on a closure to disable checks in a few lines of code instead of "a whole function". If the inliner cooperates.