Open ojeda opened 7 months ago
I started typing out a long reply, but I'll wait for a new issue. For now, please check https://github.com/rust-lang/rust/issues/120848, I've already thought of quite a few follow-up tasks on these checks and wouldn't mind adding to the list.
Thanks Ben -- I see you want things like get_unchecked
to use intrinsics, which would solve (I guess) the slice indexing bit I mention above, right?
I think that it's very fair to separate out unsafe preconditions from standard debug assertions, since these are operations where the developer is stating they've already met the preconditions and the extra checks are only there for, well, cases where they missed up.
Perhaps a separate unsafe_preconditions
flag could be added and the assertions could be exposed to end users who wish to add them to their own crates? The default would be to enable them alongside debug assertions, but they could be enabled separately. Perhaps they could even be enabled with debug assertions off, since in some ways, the two checks are usually redundant.
Another important thing to consider here is that methods on top of the intrinsics using these checks are done intentionally so that less logic actually has to be included in the intrinsics themselves. From this perspective, it's actually less desirable to push people to use the intrinsics directly, since those are unstable whereas the other methods are intended to eventually become stable.
Afaik debug-asserts
works on a per-crate level. So one option is to move the hot paths into a separate crate and applying package overrides to it when building the binary.
Generally I wouldn't expect vectorization to work with debug asserts. Practically no effort is made to make this work. E.g. none of the codegen tests run when std debug asserts are enabled.
Afaik
debug-asserts
works on a per-crate level. So one option is to move the hot paths into a separate crate and applying package overrides to it when building the binary.
@the8472 These checks specifically do not work that way. Please see my PR that implemented the system they use: https://github.com/rust-lang/rust/pull/120594
Hrrm, that seems problematic because it renders the usual advice to do that kind of splitting ineffective.
Thanks Ben -- I see you want things like
get_unchecked
to use intrinsics, which would solve (I guess) the slice indexing bit I mention above, right?
No. My wish/goal is that if a user of the standard library calls any function in a manner which is locally invalid (i.e. the kinds of things ubsan finds, not asan/msan/tsan), that erroneous call is detected at runtime, by the default cargo run
/cargo test
. I mentioned using an intrinsic so that we can have a single check, and a helpful error message from get_unchecked
instead of catching the problem in a callee, which does prevent the UB but has a generic and less-helpful message.
and get vectorization back
Since you were getting vectorization before, are you compiling with -Copt-level=1
in your debug builds? If you are, I think your use case is better satisfied by one of these items in the issue I linked:
The actual checks are hidden behind #[inline(never)] to prevent monomorphizing them many times, but that attribute also means LLVM cannot see what is being checked and optimize on it. Try changing the monomorphic check function from #[inline(never)] to some new attribute that makes them inlinable by LLVM, but not by the MIR inliner. Perhaps we call this #[inline(only_post_mono)]?
Try to deduplicate checks in a MIR pass or codegen. It's possible that after GVN we end up with MIR where a call dominates another call with the same argument(s).
Or we could also toggle some of the checks off at -Copt-level=1
.
My wish/goal is that if a user of the standard library calls any function in a manner which is locally invalid (...) that erroneous call is detected at runtime, (...) I mentioned using an intrinsic so that we can have a single check
I was not talking about avoiding the check completely. What I meant is that, from the quick test I did (in the CE link) with get
+ intrinsics, at least one check is not getting removed, which is what I guessed broke vectorization. I was hoping it was related to the duplicated checks somehow, and therefore that by removing them (i.e. simplifying) the compiler would be able to vectorize again. But from what are saying, the checks are done in a way that they are not currently inlinable, which sounds like a more likely reason. :)
Since you were getting vectorization before, are you compiling with -Copt-level=1 in your debug builds? If you are, I think your use case is better satisfied by one of these items in the issue I linked:
Please see the CE link I mentioned in OP -- it uses -O
. It was just an example of projects with "fast debug" builds (or "release with asserts"), which may be compiled with -Copt-level=2
or even -Copt-level=3
, and where a decrease of performance may make the debug build unusable for certain use cases because there may be some real-time constraints. In those projects, you can typically keep the vast majority of checks enabled, but in the hot parts of the code you may need to avoid a few of them.
That is why keeping the ability to selectively remove the checks in certain cases for -Cdebug-assertions=y
builds is important (i.e. you don't want to remove all the checks either), and thus the core::hint::unreachable_unchecked_even_in_debug
-like ideas.
The actual checks are hidden behind #[inline(never)] to prevent monomorphizing them many times, but that attribute also means LLVM cannot see what is being checked and optimize on it. Try changing the monomorphic check function from #[inline(never)] to some new attribute that makes them inlinable by LLVM, but not by the MIR inliner. Perhaps we call this #[inline(only_post_mono)]?
Try to deduplicate checks in a MIR pass or codegen. It's possible that after GVN we end up with MIR where a call dominates another call with the same argument(s).
From what I can tell, you would still do the checks even with those items done, right? i.e. you are just removing the duplicated checks and making them inlinable etc., but the check would still need to be emitted in cases it cannot be proven to not happen. So since the checks would still be there in those cases, it is likely not enough, unless combined with a core::hint::unreachable_unchecked_even_in_debug
or similar.
Or we could also toggle some of the checks off at
-Copt-level=1
.
Ideally, the users would pick what they need to disable, ideally per-call (i.e. not even per check, but per instance of the check, e.g. I may want 99% of the get_unchecked
calls to be checked, but not particular ones in the hot parts). That is why I mentioned the core::hint::unreachable_unchecked_even_in_debug
(which would be like the intrinsic) or similar.
From what I can tell, you would still do the checks even with those items done, right?
Currently, all of the checks regardless of complexity are hidden behind a #[inline(never)]
function call. I'm going to move the slice bounds check condition out of the function very soon because my benchmarking indicates that this doesn't have a detrimental effect on compile time.
Of course that doesn't get vectorization in your example code, but it should help on larger programs.
Ideally, the users would pick what they need to disable, ideally per-call
I don't currently know how to implement this. I am very wary of increasing the library maintenance burden or adding language features in the form of in-source annotations to support this. Maybe someone else who knows the library or compiler better sees how this could be done.
I think in-source annotations would be of limited use since those checks can be buried behind several layers of abstraction in a library. Iterators are an obvious example. Unless they applied to the call-tree, but we don't have many of those.
I think Miri is running into this. We are running CI with debug assertions + optimizations (since I want to catch e.g. integer overflows). When https://github.com/rust-lang/rust/pull/120594 landed, our overall CI times increased by 50%.
Some quick profiling indicates that ./miri test
currently spends ~31% of the time in the interpreter executing the slice::from_raw_parts
precondition check (and no time in others). I suspect most of that is from Vec
's Deref
impl.
That seems to align pretty well with the 50% slowdown -- the old execution time is around 66% of the current execution time, so if you can shave off 31% that brings it back very close to where we started.
Given the heavy perf impact, I am going to mark this as a regression.
I think it's reasonable to make less of the standard library (like vec::deref) use the checked from_raw_parts
I think it would also make sense to split this flag from the regular debug_assertions, similar to how overflow checks are already controlled by a separate flag.
I guess most of the current negative perf impact should go away once the check is able to be inlined. But using the extra-ub-checks flag or however we wanted to expose it for this makes sense.
I was more thinking of cfg(ub_checks)
, defaults to the same value as cfg(debug_assertions)
but can be set separately if desired.
This issue is about being able to disable a specific instantiation of a specific check, so I do not think globally-applied flags or globally-set cfgs are relevant.
I was interpreting this as a more general issue for how to deal with the perf issues around these precondition checks. Giving the author of the check more control is just one way to achieve that. But we can make that a separate issue :shrug:
I've sometimes wanted a way to enable/disable arithmetic overflow checks in a particular region of the code. This issue (the original description) sounds similar. But such a mechanism seems pretty tricky to design so I don't think we are anywhere close to having something like that.
data point: C# has checked
and unchecked
blocks (that look syntactically like Rust's unsafe
blocks): https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/statements/checked-and-unchecked
These control the behavior of integer overflow at a block-scoped level. I don't think that such a design is particularly useful for configuring extra UB checks, but it may be something to think about.
The obstacle here is not coming up with speculative designs for such a thing, but implementing any of them. I'm sure there will be an incredible amount of bikeshedding, but maybe we can at least postone that.
The nearest design to this that I can think of is #[rustc_inherit_overflow_checks]
, but crucially that is handled in MIR building, where "which function did this Rvalue come from" is obvious. But Rvalue::DebugAssertions
is lowered after MIR inlining and other such optimizations have happened.
In https://github.com/rust-lang/rust/pull/129498 I'm experimenting with ways to get the precondition checks for ptr::{read,write}
enabled. One of the things I'm trying is inventing an attribute currently called #[rustc_no_ubchecks]
which makes the MIR optimization pipeline delete checks that got inlined into the body of the indicated function. I don't think this idea is particularly good or possible to stabilize, but it is easy to implement. If it lands, it would be a thing for RFL to play around with, just in case you happen to find a use for it.
This could probably be easily extended to (again) a crude general mechanism by putting the attribute on a closure to disable checks in a few lines of code instead of "a whole function". If the inliner cooperates.
From https://github.com/rust-lang/rust/issues/85122#issuecomment-1938599162 and https://github.com/rust-lang/rust/issues/85122#issuecomment-1938688741.
After https://github.com/rust-lang/rust/pull/120594 (1.78), unsafe precondition checks always apply under
-Cdebug-assertions=y
, but there is currently no way to disable particular cases in debug builds:Cc https://github.com/rust-lang/rust/issues/120848. Cc @saethlin.