Closed Ekleog closed 6 months ago
I don’t know how much context you reviewed when discussing this matter, but it would be a good thing for any RFC here to start by reviewing the text of RFC 560, PR: https://github.com/rust-lang/rfcs/pull/560 and the discussion thread linked on its PR, and perhaps the other discussions that preceded it
Notably #146 was an important predecessor to #560 that I think tried to incorporate more allowance of undefined behavior, in a scoped manner.
Thank you for the pointers! I must say I didn't read in detail all the discussions, but looking at everything that mentions “unsafe” (assuming this would lead to the messages relevant for the current discussion, given I can't imagine such primitives not being “unsafe” and noone mentioning it) and reading about half the other posts to try not to miss any extended discussion, I think that:
mem::undefined()
without unsafe
)unsafe
(to match mem::uninitialized()
).Overall, I think the possibility of having unsafe
helper functions to have performance-optimized has not really been discussed yet, even though surrounding discussions (and especially questions about which behaviour should be the default one) have been numerous :) Then, again, I didn't read everything, so maybe I missed something.
However, these discussions do me feel better about the unchecked_*
functions being unsafe
.
That said, reading these discussions also makes me wonder which should be the behaviour for preconditions not matched in unchecked_*
:
Undefined behaviour is much more violent, and not required for LLVM's add nuw
(which only requires undefined result). However, it will likely be required for divisions (to allow reception of SIGFPE
as a valid outcome of unchecked division-by-0), or weird architectures where an overflowing addition traps.
Overall, given these functions would be defined as unsafe
, I think it'd be better to just have undefined behaviour straightaway, so that all possible optimizations could be done. What do you think about it?
BTW, I've edited the topmost post according to feedback received over IRC, basically removing the parts about a Unchecked<u*>
type and adding in a link to a benchmark in C to reduce bikeshed-likelihood.
I think Rust currently has no way to say “I know this cannot overflow, so just optimize and UB if it actually can”.
Was
let (result, overflowed) = a.overflowing_add(b);
if overflowed {
unsafe { unreachable_unchecked() }
} else {
result
}
discussed or tried?
Not yet, what has been tried is this:
pub unsafe fn foo(a: u32, b: u32) -> u32 {
a.checked_add(b).unwrap_or(std::mem::uninitialized())
}
and this
pub unsafe fn foo(a: u32, b: u32) -> u32 {
if let Some(x) = a.checked_add(b) {
x
} else {
std::hint::unreachable_unchecked()
}
}
Both of which fail to generate add nuw
in LLVM IR, and generate add
instead.
I can now confirm the same result with
pub unsafe fn foo(a: u32, b: u32) -> u32 {
let (result, overflowed) = a.overflowing_add(b);
if overflowed {
std::hint::unreachable_unchecked();
} else {
result
}
}
That said, the example you're giving does look like a reasonable “default cross-platform” implementation for unchecked_*
(though I'd likely have used the trick with checked_*
instead, given it'd also work nicely with unchecked_div
being UB on division-by-0)
BTW, rustc appears to generate most-likely optimal IR for the example using checked_div
and unreachable_unchecked
(but not the other examples, due to the jump to panic for the overflowing_add
example, and I don't really know why for the mem::uninitialized
case), apart from maybe questions about idivl
/ divl
as per the linked stackoverflow answer above, which could be investigated further during implementation (but I'd guess idivl
couldn't be used here anyway due to the semantics of unsigned division)
add nuw
Why not add nuw nsw
?
AFAIK we can't currently generate the appropriate LLVM-IR in any way, so whether these could actually make a performance difference or not in some situations is basically just speculation unless one modifies rustc to try it out, and those who want to reproduce modify and recompile rustc themselves to do so.
I think we could add these to core::intrinsics
as a step 0 first (this needs to be done anyways to implement the integer methods). That way we could all try this out on nightly in godbolt and see how much of an impact do these have.
Once we have more information we can reconsider adding the methods to the integers, the Unchecked
wrapper (I'd call it UnsafeArithmetic<T>
though), etc.
Worst case these will just hang in core::intrinsics
forever, but these might still be useful while debugging code-gen / hacking on rustc / hacking on LLVM, so I'd start there.
Indeed, it's all speculative for Rust, but in C, this StackOverflow question appears to state that integer arithmetic (ie. something similar to the unchecked
arithmetic proposed here) does perform 10% faster in at least one benchmark :)
Then, I agree with you that having intrinsics first would likely be first step. Let's wait for the follow-up of https://github.com/rust-lang/rust/pull/52205 for the time being, then, and the result of the Range
optimization could also answer the question of whether that's a useful optimization :) (“could”, because if it's a good optimization then it'll have answered the question, but if it optimizes nothing, unchecked multiplications / divisions, esp. on ARM targets, appear like much more optimize-able operations to me)
the result of the Range optimization could also answer the question of whether that's a useful optimization
These intrinsics not being useful for Range
does not imply that they are never useful. I've commented on that PR to see if it can be re-activated. I think these intrinsics are useful for experimentation, debugging performance issues, etc. even if they never make it to a stable API.
Why not add nuw nsw ?
@gnzlbg Because there's no obvious type for which to have the methods do that. My PR does nowrap_add::<u32>
as add nuw
and nowrap_add::<i32>
as add nsw
. Do you have a situation in which you were wanting the combination of both overflow flags?
I've written a note about this topic: https://users.rust-lang.org/t/high-performance-operations/19739/2
The intrinsics for unchecked_add
, unchecked_sub
and unchecked_mul
are now on nightly.
Also for unchecked_shl
and unchecked_shr
.
any plans to migrate those to stable ?
What about unchecked_neg
?
@A1-Triard There's no neg
instruction in llvm, so unchecked_neg(x)
is just unchecked_sub(0, x)
.
@A1-Triard There's no
neg
instruction in llvm, sounchecked_neg(x)
is justunchecked_sub(0, x)
.
Still, I think, it is not a reason to not having unchecked_neg
@A1-Triard It's a reason not to have it as an intrinsic, which are the only things that exist right now.
Whether there should be an unchecked_neg
in whatever gets stabilized is a different conversation, but since there's no stabilization-track APIs for this right now, that's now what's happened.
Last I heard there was potential concern with stabilizing things here, so anyone wanting to see further movement will need to do some experimentation in nightly to prove the value of these in real code.
From my perspective, the main use of this would be in code that is generated from tools like EverParse, which provide formal proofs that overflow indeed cannot occur.
There is already a.checked_add(b).unwrap_unchecked()
, etc.
If this doesn't optimize as well as unchecked_add()
, wouldn't a better solution be to add an optimization pass rather than add these functions to stable API?
Optimization can always be improved, but core API is forever.
I think the API is pretty clear cut though. If we have checked_add
, unchecked_add
looks quite natural. Especially if we get the full complement of other variants: panicking_add
, wrapping_add
, checked_add
, overflowing_add
, unchecked_add
.
Also people may not want to rely on optimizations occurring, and in e.g. a debug build I'm not sure they would. There is quite some overhead on creating an option and unwrapping it compared to a simple addition which can always be safely pessimized to any of the other add
variants but is always the fastest among available options, even at low optimization levels.
Given that unchecked_{add,sub,mul}
are stabilizing in 1.79 (https://github.com/rust-lang/rust/pull/122520#issuecomment-2007373595), I'm going to close this.
We don't need an RFC-repo issue for this. The remainder are https://github.com/rust-lang/rust/issues/85122.
(Not everything mentioned here literally exists, but for example a.unchecked_div(b)
can be spelled a / NonZero::new(b).unwrap_unchecked()
just fine, with none of the optimization issues that make unchecked_mul
currently helpful.)
On IRC, we recently discussed the speed of addition. I think Rust currently has no way to say “I know this cannot overflow, so just optimize and UB if it actually can”.
Hence the idea of adding
unchecked_add
, etc., that would beunsafe
functions. (addition of anUnchecked<u*>
type is left to later assessment)Specific list of functions that'd be available as
unchecked_*
is the same set of functions that are currently available aschecked_*
,overflowing_*
andwrapping_*
forms:unchecked_add
unchecked_sub
unchecked_mul
unchecked_div
unchecked_div_euc
unchecked_rem
unchecked_mod_euc
unchecked_neg
unchecked_shl
unchecked_shr
unchecked_pow
Is there any major roadblock/risks that you think I should be aware of before starting to write this RFC? Do you like this idea or not?
For the potential for improvement, even though it's hard to pinpoint the exact reason why, using signed arithmetic instead of unsigned arithmetic in C here made the code run 10% faster -- this is likely an equivalent gain we could have using
unchecked
variants instead ofoverflowing
(or equivalent) ones. :) (not even counting debug builds, where most of the huge gain could already be had by using eg.overflowing_*
, even though it wouldn't really be semantically correct)Note: this is the same idea as https://github.com/rust-lang/rust/pull/52205 , except it'd be available to user code