rust-lang / rfcs

RFCs for changes to Rust
https://rust-lang.github.io/rfcs/
Apache License 2.0
5.95k stars 1.57k forks source link

Pre-RFC: Unchecked arithmetic #2508

Closed Ekleog closed 6 months ago

Ekleog commented 6 years ago

On IRC, we recently discussed the speed of addition. I think Rust currently has no way to say “I know this cannot overflow, so just optimize and UB if it actually can”.

Hence the idea of adding unchecked_add, etc., that would be unsafe functions. (addition of an Unchecked<u*> type is left to later assessment)

Specific list of functions that'd be available as unchecked_* is the same set of functions that are currently available as checked_*, overflowing_* and wrapping_* forms:

Is there any major roadblock/risks that you think I should be aware of before starting to write this RFC? Do you like this idea or not?

For the potential for improvement, even though it's hard to pinpoint the exact reason why, using signed arithmetic instead of unsigned arithmetic in C here made the code run 10% faster -- this is likely an equivalent gain we could have using unchecked variants instead of overflowing (or equivalent) ones. :) (not even counting debug builds, where most of the huge gain could already be had by using eg. overflowing_*, even though it wouldn't really be semantically correct)

Note: this is the same idea as https://github.com/rust-lang/rust/pull/52205 , except it'd be available to user code

pnkfelix commented 6 years ago

I don’t know how much context you reviewed when discussing this matter, but it would be a good thing for any RFC here to start by reviewing the text of RFC 560, PR: https://github.com/rust-lang/rfcs/pull/560 and the discussion thread linked on its PR, and perhaps the other discussions that preceded it

pnkfelix commented 6 years ago

Notably #146 was an important predecessor to #560 that I think tried to incorporate more allowance of undefined behavior, in a scoped manner.

Ekleog commented 6 years ago

Thank you for the pointers! I must say I didn't read in detail all the discussions, but looking at everything that mentions “unsafe” (assuming this would lead to the messages relevant for the current discussion, given I can't imagine such primitives not being “unsafe” and noone mentioning it) and reading about half the other posts to try not to miss any extended discussion, I think that:

Overall, I think the possibility of having unsafe helper functions to have performance-optimized has not really been discussed yet, even though surrounding discussions (and especially questions about which behaviour should be the default one) have been numerous :) Then, again, I didn't read everything, so maybe I missed something.

However, these discussions do me feel better about the unchecked_* functions being unsafe.


That said, reading these discussions also makes me wonder which should be the behaviour for preconditions not matched in unchecked_*:

  1. Undefined result
  2. Undefined behaviour

Undefined behaviour is much more violent, and not required for LLVM's add nuw (which only requires undefined result). However, it will likely be required for divisions (to allow reception of SIGFPE as a valid outcome of unchecked division-by-0), or weird architectures where an overflowing addition traps.

Overall, given these functions would be defined as unsafe, I think it'd be better to just have undefined behaviour straightaway, so that all possible optimizations could be done. What do you think about it?

BTW, I've edited the topmost post according to feedback received over IRC, basically removing the parts about a Unchecked<u*> type and adding in a link to a benchmark in C to reduce bikeshed-likelihood.

glaebhoerl commented 6 years ago

I think Rust currently has no way to say “I know this cannot overflow, so just optimize and UB if it actually can”.

Was

let (result, overflowed) = a.overflowing_add(b);
if overflowed { 
    unsafe { unreachable_unchecked() }
} else {
    result
}

discussed or tried?

Ekleog commented 6 years ago

Not yet, what has been tried is this:

pub unsafe fn foo(a: u32, b: u32) -> u32 {
    a.checked_add(b).unwrap_or(std::mem::uninitialized())
}

(playground)

and this

pub unsafe fn foo(a: u32, b: u32) -> u32 {
    if let Some(x) = a.checked_add(b) {
        x
    } else {
        std::hint::unreachable_unchecked()
    }
}

(playground)

Both of which fail to generate add nuw in LLVM IR, and generate add instead.

I can now confirm the same result with

pub unsafe fn foo(a: u32, b: u32) -> u32 {
    let (result, overflowed) = a.overflowing_add(b);
    if overflowed {
        std::hint::unreachable_unchecked();
    } else {
        result
    }
}

(playground)

That said, the example you're giving does look like a reasonable “default cross-platform” implementation for unchecked_* (though I'd likely have used the trick with checked_* instead, given it'd also work nicely with unchecked_div being UB on division-by-0)

BTW, rustc appears to generate most-likely optimal IR for the example using checked_div and unreachable_unchecked (but not the other examples, due to the jump to panic for the overflowing_add example, and I don't really know why for the mem::uninitialized case), apart from maybe questions about idivl / divl as per the linked stackoverflow answer above, which could be investigated further during implementation (but I'd guess idivl couldn't be used here anyway due to the semantics of unsigned division)

gnzlbg commented 6 years ago

add nuw

Why not add nuw nsw ?


AFAIK we can't currently generate the appropriate LLVM-IR in any way, so whether these could actually make a performance difference or not in some situations is basically just speculation unless one modifies rustc to try it out, and those who want to reproduce modify and recompile rustc themselves to do so.

I think we could add these to core::intrinsics as a step 0 first (this needs to be done anyways to implement the integer methods). That way we could all try this out on nightly in godbolt and see how much of an impact do these have.

Once we have more information we can reconsider adding the methods to the integers, the Unchecked wrapper (I'd call it UnsafeArithmetic<T> though), etc.

Worst case these will just hang in core::intrinsics forever, but these might still be useful while debugging code-gen / hacking on rustc / hacking on LLVM, so I'd start there.

Ekleog commented 6 years ago

Indeed, it's all speculative for Rust, but in C, this StackOverflow question appears to state that integer arithmetic (ie. something similar to the unchecked arithmetic proposed here) does perform 10% faster in at least one benchmark :)

Then, I agree with you that having intrinsics first would likely be first step. Let's wait for the follow-up of https://github.com/rust-lang/rust/pull/52205 for the time being, then, and the result of the Range optimization could also answer the question of whether that's a useful optimization :) (“could”, because if it's a good optimization then it'll have answered the question, but if it optimizes nothing, unchecked multiplications / divisions, esp. on ARM targets, appear like much more optimize-able operations to me)

gnzlbg commented 6 years ago

the result of the Range optimization could also answer the question of whether that's a useful optimization

These intrinsics not being useful for Range does not imply that they are never useful. I've commented on that PR to see if it can be re-activated. I think these intrinsics are useful for experimentation, debugging performance issues, etc. even if they never make it to a stable API.

scottmcm commented 6 years ago

Why not add nuw nsw ?

@gnzlbg Because there's no obvious type for which to have the methods do that. My PR does nowrap_add::<u32> as add nuw and nowrap_add::<i32> as add nsw. Do you have a situation in which you were wanting the combination of both overflow flags?

leonardo-m commented 6 years ago

I've written a note about this topic: https://users.rust-lang.org/t/high-performance-operations/19739/2

lcnr commented 5 years ago

The intrinsics for unchecked_add, unchecked_sub and unchecked_mul are now on nightly.

RustyYato commented 5 years ago

Also for unchecked_shl and unchecked_shr.

eloraiby commented 4 years ago

any plans to migrate those to stable ?

A1-Triard commented 4 years ago

What about unchecked_neg?

scottmcm commented 4 years ago

@A1-Triard There's no neg instruction in llvm, so unchecked_neg(x) is just unchecked_sub(0, x).

A1-Triard commented 4 years ago

@A1-Triard There's no neg instruction in llvm, so unchecked_neg(x) is just unchecked_sub(0, x).

Still, I think, it is not a reason to not having unchecked_neg

scottmcm commented 4 years ago

@A1-Triard It's a reason not to have it as an intrinsic, which are the only things that exist right now.

Whether there should be an unchecked_neg in whatever gets stabilized is a different conversation, but since there's no stabilization-track APIs for this right now, that's now what's happened.

Last I heard there was potential concern with stabilizing things here, so anyone wanting to see further movement will need to do some experimentation in nightly to prove the value of these in real code.

DemiMarie commented 3 years ago

From my perspective, the main use of this would be in code that is generated from tools like EverParse, which provide formal proofs that overflow indeed cannot occur.

tczajka commented 6 months ago

There is already a.checked_add(b).unwrap_unchecked(), etc.

If this doesn't optimize as well as unchecked_add(), wouldn't a better solution be to add an optimization pass rather than add these functions to stable API?

Optimization can always be improved, but core API is forever.

digama0 commented 6 months ago

I think the API is pretty clear cut though. If we have checked_add, unchecked_add looks quite natural. Especially if we get the full complement of other variants: panicking_add, wrapping_add, checked_add, overflowing_add, unchecked_add.

Also people may not want to rely on optimizations occurring, and in e.g. a debug build I'm not sure they would. There is quite some overhead on creating an option and unwrapping it compared to a simple addition which can always be safely pessimized to any of the other add variants but is always the fastest among available options, even at low optimization levels.

scottmcm commented 6 months ago

Given that unchecked_{add,sub,mul} are stabilizing in 1.79 (https://github.com/rust-lang/rust/pull/122520#issuecomment-2007373595), I'm going to close this.

We don't need an RFC-repo issue for this. The remainder are https://github.com/rust-lang/rust/issues/85122.

(Not everything mentioned here literally exists, but for example a.unchecked_div(b) can be spelled a / NonZero::new(b).unwrap_unchecked() just fine, with none of the optimization issues that make unchecked_mul currently helpful.)