Disabling particular unsafe precondition checks under `-Cdebug-assertions=y`

ojeda commented 7 months ago

From https://github.com/rust-lang/rust/issues/85122#issuecomment-1938599162 and https://github.com/rust-lang/rust/issues/85122#issuecomment-1938688741.

After https://github.com/rust-lang/rust/pull/120594 (1.78), unsafe precondition checks always apply under -Cdebug-assertions=y, but there is currently no way to disable particular cases in debug builds:

They will make things too slow for somebody sooner or later -- it is not uncommon in projects to have a "fast debug" build where a few checks need to be explicitly bypassed in hot paths. And it is not just the branch/call itself, it can also break other optimizations like vectorization: https://godbolt.org/z/MMTordabn.

I really like the unsafe precondition checks, but I would like to make sure there is an escape hatch for -Cdebug-assertions=y builds. Perhaps we need a core::hint::unreachable_unchecked_even_in_debug-like (even if it does not solve the "there is no checked API" case, but I don't know if that is an issue). For me, core::hint::unreachable_unchecked was that in the past, and things like unwrap_unchecked() were the "checked in debug" ones.

I also just noticed that using the intrinsic for slice indexing is not enough to remove the check in a debug build and get vectorization back (included in the link).

Cc https://github.com/rust-lang/rust/issues/120848. Cc @saethlin.

ojeda commented 7 months ago

I started typing out a long reply, but I'll wait for a new issue. For now, please check https://github.com/rust-lang/rust/issues/120848, I've already thought of quite a few follow-up tasks on these checks and wouldn't mind adding to the list.

Thanks Ben -- I see you want things like get_unchecked to use intrinsics, which would solve (I guess) the slice indexing bit I mention above, right?

clarfonthey commented 7 months ago

I think that it's very fair to separate out unsafe preconditions from standard debug assertions, since these are operations where the developer is stating they've already met the preconditions and the extra checks are only there for, well, cases where they missed up.

Perhaps a separate unsafe_preconditions flag could be added and the assertions could be exposed to end users who wish to add them to their own crates? The default would be to enable them alongside debug assertions, but they could be enabled separately. Perhaps they could even be enabled with debug assertions off, since in some ways, the two checks are usually redundant.

Another important thing to consider here is that methods on top of the intrinsics using these checks are done intentionally so that less logic actually has to be included in the intrinsics themselves. From this perspective, it's actually less desirable to push people to use the intrinsics directly, since those are unstable whereas the other methods are intended to eventually become stable.

the8472 commented 7 months ago

Afaik debug-asserts works on a per-crate level. So one option is to move the hot paths into a separate crate and applying package overrides to it when building the binary.

Generally I wouldn't expect vectorization to work with debug asserts. Practically no effort is made to make this work. E.g. none of the codegen tests run when std debug asserts are enabled.

saethlin commented 7 months ago

Afaik debug-asserts works on a per-crate level. So one option is to move the hot paths into a separate crate and applying package overrides to it when building the binary.

@the8472 These checks specifically do not work that way. Please see my PR that implemented the system they use: https://github.com/rust-lang/rust/pull/120594

the8472 commented 7 months ago

Hrrm, that seems problematic because it renders the usual advice to do that kind of splitting ineffective.

saethlin commented 7 months ago

Thanks Ben -- I see you want things like get_unchecked to use intrinsics, which would solve (I guess) the slice indexing bit I mention above, right?

No. My wish/goal is that if a user of the standard library calls any function in a manner which is locally invalid (i.e. the kinds of things ubsan finds, not asan/msan/tsan), that erroneous call is detected at runtime, by the default cargo run/cargo test. I mentioned using an intrinsic so that we can have a single check, and a helpful error message from get_unchecked instead of catching the problem in a callee, which does prevent the UB but has a generic and less-helpful message.

and get vectorization back

Since you were getting vectorization before, are you compiling with -Copt-level=1 in your debug builds? If you are, I think your use case is better satisfied by one of these items in the issue I linked:

The actual checks are hidden behind #[inline(never)] to prevent monomorphizing them many times, but that attribute also means LLVM cannot see what is being checked and optimize on it. Try changing the monomorphic check function from #[inline(never)] to some new attribute that makes them inlinable by LLVM, but not by the MIR inliner. Perhaps we call this #[inline(only_post_mono)]?

Try to deduplicate checks in a MIR pass or codegen. It's possible that after GVN we end up with MIR where a call dominates another call with the same argument(s).

Or we could also toggle some of the checks off at -Copt-level=1.

ojeda commented 7 months ago

My wish/goal is that if a user of the standard library calls any function in a manner which is locally invalid (...) that erroneous call is detected at runtime, (...) I mentioned using an intrinsic so that we can have a single check

I was not talking about avoiding the check completely. What I meant is that, from the quick test I did (in the CE link) with get + intrinsics, at least one check is not getting removed, which is what I guessed broke vectorization. I was hoping it was related to the duplicated checks somehow, and therefore that by removing them (i.e. simplifying) the compiler would be able to vectorize again. But from what are saying, the checks are done in a way that they are not currently inlinable, which sounds like a more likely reason. :)

Since you were getting vectorization before, are you compiling with -Copt-level=1 in your debug builds? If you are, I think your use case is better satisfied by one of these items in the issue I linked:

Please see the CE link I mentioned in OP -- it uses -O. It was just an example of projects with "fast debug" builds (or "release with asserts"), which may be compiled with -Copt-level=2 or even -Copt-level=3, and where a decrease of performance may make the debug build unusable for certain use cases because there may be some real-time constraints. In those projects, you can typically keep the vast majority of checks enabled, but in the hot parts of the code you may need to avoid a few of them.

That is why keeping the ability to selectively remove the checks in certain cases for -Cdebug-assertions=y builds is important (i.e. you don't want to remove all the checks either), and thus the core::hint::unreachable_unchecked_even_in_debug-like ideas.

ojeda commented 7 months ago

The actual checks are hidden behind #[inline(never)] to prevent monomorphizing them many times, but that attribute also means LLVM cannot see what is being checked and optimize on it. Try changing the monomorphic check function from #[inline(never)] to some new attribute that makes them inlinable by LLVM, but not by the MIR inliner. Perhaps we call this #[inline(only_post_mono)]?

Try to deduplicate checks in a MIR pass or codegen. It's possible that after GVN we end up with MIR where a call dominates another call with the same argument(s).

From what I can tell, you would still do the checks even with those items done, right? i.e. you are just removing the duplicated checks and making them inlinable etc., but the check would still need to be emitted in cases it cannot be proven to not happen. So since the checks would still be there in those cases, it is likely not enough, unless combined with a core::hint::unreachable_unchecked_even_in_debug or similar.

Or we could also toggle some of the checks off at -Copt-level=1.

Ideally, the users would pick what they need to disable, ideally per-call (i.e. not even per check, but per instance of the check, e.g. I may want 99% of the get_unchecked calls to be checked, but not particular ones in the hot parts). That is why I mentioned the core::hint::unreachable_unchecked_even_in_debug (which would be like the intrinsic) or similar.

saethlin commented 7 months ago

From what I can tell, you would still do the checks even with those items done, right?

Currently, all of the checks regardless of complexity are hidden behind a #[inline(never)] function call. I'm going to move the slice bounds check condition out of the function very soon because my benchmarking indicates that this doesn't have a detrimental effect on compile time.

Of course that doesn't get vectorization in your example code, but it should help on larger programs.

Ideally, the users would pick what they need to disable, ideally per-call

I don't currently know how to implement this. I am very wary of increasing the library maintenance burden or adding language features in the form of in-source annotations to support this. Maybe someone else who knows the library or compiler better sees how this could be done.

the8472 commented 7 months ago

I think in-source annotations would be of limited use since those checks can be buried behind several layers of abstraction in a library. Iterators are an obvious example. Unless they applied to the call-tree, but we don't have many of those.