Open clarfonthey opened 3 years ago
Thanks for this! We are seeing some noticeably faster code in on several embedded DSP applications in cortex-m7 with unchecked_sh[lr]
. The additional masking has always bothered me as unnecessary in our case.
One question on consistency and API choice though:
unchecked_sh[rl](self, rhs: Self)
while all the other take a u32
as rhs
: {overflowing,wrapping,checked}_sh[rl](self, rhs: u32)
. Why is that?I vote for adding exact_div
into the library! It could be heavily used in integer factorization and other applications. I'm also developing a checked version of exact division in my crate. I think unchecked_exact_div
or unchecked_div_exact
will be good names. (I personally prefer div_exact
over exact_div
as it follows the pattern of div_euclid
)
Besides, I'm also thinking about a shorter name for unchecked_
, something like raw_add
, raw_shl
may be easier to type?
I don't think that "raw" is a good naming scheme, since in general, you should not be using these primitives directly. In general, they should be hidden behind other methods that are safe to call. So, I would say that the "unchecked" naming is, at best, not descriptive enough, and that "raw" would be further from the goal of being descriptive.
@jordens Agreed the different RHS type is incorrect. I've sent PR #103456.
I was looking for an unchecked_div
the other day, as I was trying to get rid of a superfluous zero test, and was surprised that one doesn't exist even under this feature. Any particular reason it's not included (as in, why is it an unresolved question whether there should be one?) Sure, one can use NonZero*
as a divisor even on stable to get a safe unchecked div, but still.
I think that if there's a reason, it's that there two kinds of unchecked-ness in signed unchecked_div, and it'd be very easy to trigger the INT_MIN / -1
UB when you only remembered to prove that the denominator is non-zero.
I definitely prefer an unsafe type invariant to an unsafe operation where possible, so I think I like the Div<NonZeroTTT>
better. After all, with #100428 I think it's x / NonZero::new_unchecked(y)
, which is very competitive with x.unchecked_div(y)
, especially if the producer of y
can just change to reflect the non-zeroness of its result.
But it might fine to add unchecked_div
under this gate and leave it a stabilization question to figure it out if it's worth keeping. We'll see what libs-api has to say.
(Notably, the "the addition overflows" kind of UB is the only kind of UB for unchecked_add
, and it's also something that can't be feasibly pre-checked and thus put into a type invariant, the way "it's not zero" can be.)
For the signed cases, I think it is good to have it for consistency with the other operations and to avoid having to play with unreachable_unchecked()
(and hope it gets optimized away) or intrinsics
(unstable). Or is there another way to do it in stable?
For the unsigned cases, I think it would still be good to offer it for consistency with the signed case and with the other operations, even if one can use NonZero*
(perhaps documenting that as an alternative).
Are there any objections to stabilizing at least unchecked_{add,sub,mul}
? For those there shouldn't be any surprises about what their safety requirements are.
A unary unchecked_neg
could probably also be added to that list.
@RalfJung That seems entirely reasonable to me. a.unchecked_add(b)
is semantically a.checked_add(b).unwrap_or_else(hint::unreachable_unchecked)
, so I don't think there's any lang issues with it -- it's not a newly expressible concept, not even in const
-- so I think a libs-api FCP and it'd be good to go.
Nominating it on behalf of Ralf to get libs-api's opinion. Would you want the FCP the whole thing, or FCP a PR that just does a subset? Or is there still skepticism about exposing these?
We discussed this in a recent libs-api meeting. We mostly felt positive about stabilizing this, but it'd be helpful to have a clear overview of the exact APIs this tracking issue is tracking.
Great; does someone want to prepare a stabilization PR? I think that can cover everything that is currently under the unchecked_math
feature gate; an unchecked_div
function does not seem to exist (it only exists as an intrinsic, and we also have the exact_div
intrinsic, but those will obviously not get stabilized).
A unary
unchecked_neg
could probably also be added to that list.
Such a function doesn't exist yet, so it would have to be added (in a separate PR) before it can be stabilized. This requires an ACP since it's a new public function. None of this is insurmountable but I think if someone wants to push for this they can do this in parallel to the effort of stabilizing the existing functions in the unchecked_math
feature gate; there's no reason they have to all be stabilized at once.
I was under the impression that since we have a tracking issue for this, adding a related function (like unchecked_neg
) would be part of the general API ironing-out process and thus not require an ACP. Effectively, I'd expect the response to the ACP to be "this discussion should go in the tracking issue," so, that's why I'm discussing this here.
If we were to propose radically different APIs it would make sense to make a new ACP, but since we're discussing the basic arithmetic operations that have already been marinating for a while, it makes sense to simply make a PR to add unchecked_neg
.
Later today I'll get to work on making a PR that adds unchecked_neg
and recategorises unchecked_add
, unchecked_sub
, unchecked_neg
, and unchecked_mul
to a separate unchecked_math_min
feature flag, so that that subset can be stabilised by itself.
Ideally, we'd have the following subsets:
unchecked_math_min
)unchecked_division
)unchecked_math
, maybe call this unchecked_shifts
)Note that unchecked_division
and unchecked_shifts
should probably have a proper ACP, since those have major API questions worth ironing out, and the result will likely involve splitting into multiple (new) methods. Whereas the existing addition & multiplication methods should probably just be stabilised as-is.
:shrug: I'm not on the libs-api team, feel free to give it a shot. :)
Hello. I found the problem in using unchecked integer methods. My program works correctly in debug mode and outputs INT3 at the end, but in release mode the program skips my check by 0, causing the program to not work correctly. If I replace unchecked_add
with wrapping_add
, the program works correctly in both release and debug modes. This problem is also repeated on rust playground.
#![feature(unchecked_math)]
use std::convert::TryInto;
#[derive(Debug)]
pub struct CPU {
ram: Vec<u8>
}
impl CPU {
pub fn new(ram: Vec<u8>) -> Self {
Self { ram }
}
pub fn exec(&mut self, mut ip: usize) {
let mut di = 0u16;
let mut zero = false;
loop {
let current = self.ram[ip];
ip += 1;
match current {
0x47 => {
println!("INC DI");
di = unsafe { di.unchecked_add(1) };
if di == 0 {
zero = true;
} else {
zero = false;
}
}
0x75 => {
let address: i8 = unsafe { core::mem::transmute(self.ram[ip]) };
let address = address as isize;
ip += 1;
println!("JNZ ({}) {address}", !zero);
if !zero {
if address < 0 {
ip -= (-address) as usize;
} else {
ip += address as usize;
}
}
}
0xBF => {
let val = u16::from_le_bytes(self.ram[ip..ip+2].try_into().unwrap());
ip += 2;
println!("MOV DI, {val}");
di = val;
}
0xCC => {
println!("INT3");
return;
}
0xC6 => {
match self.ram[ip] {
5 => {
let val = self.ram[ip];
ip += 1;
println!("MOV BYTE [{}], {val}", di);
let addr = di as usize;
self.ram[addr] = val;
}
_ => todo!()
}
ip += 1;
}
_ => todo!()
}
}
}
}
fn main() {
let mut ram = vec![0; 64 * 1024];
/*
mov di, 8000h
loop:
mov byte [di], 13h
inc di
jnz loop
int3
*/
ram[0..10].copy_from_slice(&[0xbf, 0x00, 0x80, 0xc6, 0x05, 0x13, 0x47, 0x75, 0xfa, 0xcc]);
CPU::new(ram).exec(0);
}
@unia909 thanks for the report! However, this is a tracking issue, which are for discussing the feature in general, not any particular issues with it. Please open a new issue for your problem, and link it from here.
That said, there's a bug in your code. If you replace di.unchecked_add(1)
by di.checked_add(1).unwrap()
, you get an assertion failure. Using unchecked_add
you are promising that this will never overflow, and yet it does overflow, so your code has UB and getting the wrong result is expected.
If I want the code to be able to overflow a number, then what should I use? Wrapping methods is the only correct way?
Yes, that's exactly what wrapping_add
is for. Please carefully read the docs for unchecked_add
. They explicitly say that you must not call unchecked_add
when things might overflow! That's literally the only reason these methods even exist.
Why did you want to use the unchecked method when you actually want overflow to be allowed?
The PR fixing the feature flags is merged and we could probably do an FCP for the unchecked_math
methods, assuming that the recent addition of unchecked_neg
isn't too controversial. It's only implemented for signed integers because I didn't think that an "eat my laundry if this integer isn't zero" operator was a very good addition.
If I want the code to be able to overflow a number
Well, since you want both the wrapping result and to know whether it overflowed, then the method with "overflow" in the name looks like exactly what you want: https://doc.rust-lang.org/std/primitive.u16.html#method.overflowing_add
Rather than doing stuff manually with
di = di.wrapping_add(1);
if di == 0 {
zero = true;
} else {
zero = false;
}
it looks like you could just do
(di, zero) = di.overflowing_add(1);
But to re-iterate what Ralf said, please read the # Safety
sections for anything you're using in unsafe { … }
blocks. One of the reasons these weren't added years ago was concerns that they'd be misused like this 😞
I seem to recall the naming of these methods being bikeshedded long ago, but this does seem like a good example of the naming being confusing.
After all, the semantics of the normal + operator are 'wrapping addition plus an optional overflow check'. It's easy to guess that a method called unchecked_add
would have the semantics 'same as + but without the check'. (Compare to the similarly named command-line option -C overflow-checks=off
, which does have that effect.)
There's nothing wrong with the name of the correct method, wrapping_add
. But some users may not be familiar with the term 'wrapping'. Or even if they do know what it means, they may be tunnel-visioned in on the term 'overflow' (since that's what the error message said when they tried to use regular addition). Or they may just not notice wrapping_add
in the method list.
If nothing else, the documentation could be improved:
The documentation for unchecked
methods should have a more explicit caution and should point you to the wrapped
methods.
After all, these methods should be used extremely rarely in idiomatic Rust. They may be confusing to new Rust users since there are almost no other programming languages with 'UB on overflow' operations – with the obvious exception of C and C++, but in their case only for signed integers. (I do notice that unia909's example was using an unsigned integer. In the abstract it might seem obvious that signedness of integers is orthogonal to wrappingness of arithmetic, but it may not feel that way if you're used to C and/or C++.)
The documentation for wrapping
methods should perhaps mention the term 'overflow', especially to assist anyone doing a find-in-page on the docs site.
On a similar note, users who search for 'overflow' are even more likely to find the overflowing
methods, so the docs for those should perhaps explicitly mention wrapping
as the alternative to use if you don't want the boolean part of the output.
I honestly don't know how the docs could be more clear than:
Unchecked integer addition. Computes
self + rhs
, assuming overflow cannot occur.
We could also link to the wrapping
methods in the docs, but ultimately, I don't think that it's worth renaming the methods to something non-standard (we already have many similar "unchecked" methods that have invariants in the docs) since the real problem here is people using unsafe code without knowing what they're doing. If you add an unsafe
block without reading the docs, no matter what, we can't prevent you from breaking things.
One thing that hadn't occurred to me that might improve the situation is using the assume_unsafe_precondition!
macro inside these methods, like is done in similar libstd methods. Will submit a PR to see if that improves things.
One way to alleviate the concern of misuse would be to have a feature in rustdoc
to toggle a button to show "expert"/"uncommon" methods. For already unsafe
methods, it may not be that needed, but this could be quite useful for other "safe but subtle" methods. This could be used not just for things related to UB, but also other kinds of methods that may be tricky for other reasons. It could be generalized further (e.g. categories or tags) that can be useful in libraries (e.g. low-level vs. high-level API).
Personally, I really like that Rust has this kind of unsafe methods and provides choice -- it is something I have mentioned to the C committee, as an example of what it could be provided.
One thing that hadn't occurred to me that might improve the situation is using the
assume_unsafe_precondition!
macro inside these methods, like is done in similar libstd methods. Will submit a PR to see if that improves things.
I agree it's good to have these, but they wouldn't have helped here since the std we ship to users is built without debug assertions.
One thing that hadn't occurred to me that might improve the situation is using the
assume_unsafe_precondition!
macro inside these methods, like is done in similar libstd methods. Will submit a PR to see if that improves things.I agree it's good to have these, but they wouldn't have helped here since the std we ship to users is built without debug assertions.
I was under the impression that this macro was specifically meant to be inlined so that it does apply without debug assertions, although I am probably wrong about that. No idea how cfg!
interacts with inlining.
One way to alleviate the concern of misuse would be to have a feature in
rustdoc
to toggle a button to show "expert"/"uncommon" methods. For alreadyunsafe
methods, it may not be that needed, but this could be quite useful for other "safe but subtle" methods. This could be used not just for things related to UB, but also other kinds of methods that may be tricky for other reasons. It could be generalized further (e.g. categories or tags) that can be useful in libraries (e.g. low-level vs. high-level API).
I have no data for this, but I would assume that the people who are calling this method in error are doing so primarily via IDE completions, guessing the method, or misguided copy-paste, and thus this wouldn't really help. We could add a warning to the compiler that triggers whenever a user uses an unsafe block for the first time and require them to agree to an "I pinky-promise I can read the docs and will" message, but that just feels patronising and like a lot of work for little gain.
No idea how cfg! interacts with inlining.
cfg! is expanded right after parsing, waaaay before inlining. It always uses the flags of the crate it is syntactically located in.
I have no data for this, but I would assume that the people who are calling this method in error are doing so primarily via IDE completions,
Sounds like we should work with IDE people to avoid auto-completing unsafe functions outside of unsafe blocks, or at least somehow do that in a way that makes it clear to users that they need to take manual action here.
I hate to be the one who starts bikeshedding about the name, but should those be called foo_unchecked
instead? It did a quick grep over the library directory and methods introduced here are the only ones which use unchecked_
prefix. The rest of the standard library prefers _unchecked
suffix.
$ git grep 'fn.* unchecked_[a-zA-Z0-9_]*'
core/src/intrinsics.rs:2103: pub fn unchecked_div<T: Copy>(x: T, y: T) -> T;
core/src/intrinsics.rs:2112: pub fn unchecked_rem<T: Copy>(x: T, y: T) -> T;
core/src/intrinsics.rs:2122: pub fn unchecked_shl<T: Copy>(x: T, y: T) -> T;
core/src/intrinsics.rs:2131: pub fn unchecked_shr<T: Copy>(x: T, y: T) -> T;
core/src/intrinsics.rs:2139: pub fn unchecked_add<T: Copy>(x: T, y: T) -> T;
core/src/intrinsics.rs:2147: pub fn unchecked_sub<T: Copy>(x: T, y: T) -> T;
core/src/intrinsics.rs:2155: pub fn unchecked_mul<T: Copy>(x: T, y: T) -> T;
core/src/num/int_macros.rs:477: pub const unsafe fn unchecked_add(self, rhs: Self) -> Self {
core/src/num/int_macros.rs:545: pub const unsafe fn unchecked_sub(self, rhs: Self) -> Self {
core/src/num/int_macros.rs:613: pub const unsafe fn unchecked_mul(self, rhs: Self) -> Self {
core/src/num/int_macros.rs:762: pub const unsafe fn unchecked_neg(self) -> Self {
core/src/num/int_macros.rs:809: pub const unsafe fn unchecked_shl(self, rhs: u32) -> Self {
core/src/num/int_macros.rs:857: pub const unsafe fn unchecked_shr(self, rhs: u32) -> Self {
core/src/num/nonzero.rs:430: pub const unsafe fn unchecked_add(self, other: $Int) -> $Ty {
core/src/num/nonzero.rs:1097: pub const unsafe fn unchecked_mul(self, other: $Ty) -> $Ty {
core/src/num/uint_macros.rs:485: pub const unsafe fn unchecked_add(self, rhs: Self) -> Self {
core/src/num/uint_macros.rs:554: pub const unsafe fn unchecked_sub(self, rhs: Self) -> Self {
core/src/num/uint_macros.rs:601: pub const unsafe fn unchecked_mul(self, rhs: Self) -> Self {
core/src/num/uint_macros.rs:938: pub const unsafe fn unchecked_shl(self, rhs: u32) -> Self {
core/src/num/uint_macros.rs:986: pub const unsafe fn unchecked_shr(self, rhs: u32) -> Self {
$ git grep 'fn.* [a-zA-Z0-9_]*_unchecked' |wc -l
226
The main reason is that the other arithmetic methods use prefixes (e.g. wrapping_add
), although, fair point.
It's another weird inconsistency we'll probably have to live with.
As was mentioned, I agree that the name is confusing for the exact reason @comex explained. I think I've seen an experienced developer getting confused about the meaning (not sure if just communication issue).
Proposed alternatives:
add_assume_no_overflow
add_no_overflow
non_overflowing_add
They are bit longer but I guess the method is not used a lot.
inbounds_{add,mul,sub}
.
I mean, it seems pretty fine to have unchecked_*
methods that are just the checked_*
methods except that the None
case is UB. IMHO, the confusing bits are where these don't exactly overlap with the way the checked_*
methods work, like for division where you're not sure if you're talking about divide-by-zero, overflow, or a remainder.
This is also why unchecked_shifts
are under a separate feature gate, in addition to their behaviour just being weird, since it's pretty clear-cut for addition, subtraction, and multiplication.
I personally don't mind the unchecked_*
names, but I do love to go through my Pantone book!
IMHO, the confusing bits are where these don't exactly overlap with the way the
checked_*
methods work
Could we document every single a.unchecked_foo(b)
as being identical to a.checked_foo(b).unwrap_unchecked()
?
Are there any exceptions to that rule?
I don't think there are for add
/sub
/mul
. And those are the important ones, really -- they're the ones with special LLVM flags. https://rust.godbolt.org/z/hMK1e7cf8
So I'd be fine even if we just end up stabilizing those and none of the others. checked_div
is just control flow written out; once we're on LLVM19 the checked_div(d).unwrap_unchecked()
phrasing will optimize fine.
EDIT: Oh, I have another post above avoid maybe just skipping unchecked_div
too https://github.com/rust-lang/rust/issues/85122#issuecomment-1324355551
Note that unchecked_div
is also not under this feature flag, if it does exist. (I don't think it does beyond the intrinsic?)
I'd be fine with making the documentation update before stabilisation, or as part of it, though. I think it's worth pointing out.
One thing that hadn't occurred to me that might improve the situation is using the
assume_unsafe_precondition!
macro inside these methods, like is done in similar libstd methods. Will submit a PR to see if that improves things.I agree it's good to have these, but they wouldn't have helped here since the std we ship to users is built without debug assertions.
With https://github.com/rust-lang/rust/pull/120594 merged now everyone would benefit so...
Could we document every single
a.unchecked_foo(b)
as being identical toa.checked_foo(b).unwrap_unchecked()
?Are there any exceptions to that rule?
...-Cdebug-assertions=y
makes them differ now (for everybody, not just users that build core
), even though I don't know if those checks are documented as a guarantee (e.g. unwrap_unchecked()
did not document its former debug_assert!
): https://godbolt.org/z/W4ob6eorE.
Should unchecked_add
etc. get the unsafe debug checks too? Cc @saethlin. I imagine that it would make sense, since even unreachable_unchecked
has it now.
But then how does one write a "unchecked even in debug" one? That is useful in projects where debug builds require some performance to be useful, and certain checks may need to be disabled for that reason. checked_foo()
+ match
+ intrinsics works in this instance, but it requires intrinsics. And what if there is no "checked" version of an API?
Is there another way I am missing?
The unsafe preconditions are designed to be as slim as possible for performance reasons, so, hopefully shouldn't make things too slow, but I also am not sure either way.
They will make things too slow for somebody sooner or later -- it is not uncommon in projects to have a "fast debug" build where a few checks need to be explicitly bypassed in hot paths. And it is not just the branch/call itself, it can also break other optimizations like vectorization: https://godbolt.org/z/MMTordabn.
I really like the unsafe precondition checks, but I would like to make sure there is an escape hatch for -Cdebug-assertions=y
builds. Perhaps we need a core::hint::unreachable_unchecked_even_in_debug
-like (even if it does not solve the "there is no checked API" case, but I don't know if that is an issue). For me, core::hint::unreachable_unchecked
was that in the past, and things like unwrap_unchecked()
were the "checked in debug" ones.
I also just noticed that using the intrinsic for slice indexing is not enough to remove the check in a debug build and get vectorization back (included in the link).
(I guess it is best if I move this into its own issue).
(I guess it is best if I move this into its own issue).
Indeed, this is largely unrelated to this issue and mostly about https://github.com/rust-lang/rust/pull/120594, so please open a new issue.
I started typing out a long reply, but I'll wait for a new issue. For now, please check https://github.com/rust-lang/rust/issues/120848, I've already thought of quite a few follow-up tasks on these checks and wouldn't mind adding to the list.
As an aside to specifically mention the status of adding unsafe checks to these methods, #117494 was stalled because of compiler errors I couldn't resolve, which lead to me focusing on #117571 to attempt to provide more info to said errors so I could debug them.
I haven't done any work on these PRs recently, but I'll try to rectify that this week, and if anyone else is interested in taking a look to maybe help out, I would appreciate it. I'm very unfamiliar with compiler internals, so, it's been a bit of a learning curve for me.
I'll also take a look about editing the docs while I'm at it.
More up-to-date update: I decided to get around the compiler lint errors for now, and while it did require a bit of excursion for the shift operations in particular, #117494 should be ready to merge and will panic in debug mode when the unsafe assertions are ignored. That should hopefully help substantially with the concerns that people are using these methods incorrectly, since users without any additional setup to disable these assertions (who should not be doing that unless they know what they're doing) will run into errors in their tests when using these incorrectly.
I also don't think that we should care about users who are running into issues but not testing their code, because… well, yeah, that's how you verify your code is working!
Unchecked variants of the {signed_integer}::checked_{add,sub}_unsigned
functions and vice-versa could be useful if those properties can be conveyed to LLVM.
See #122420 and #122461
@the8472 If we get https://discourse.llvm.org/t/rfc-add-nowrap-flags-to-trunc/77453?u=scottmcm then I think it'd be possible to communicate it -- we could do sext + zext + add nuw nsw + trunc nsw -- but I suspect that it would end up getting optimized back down to a normal wrapping add.
Might help for smaller types, though, if it encourages LLVM to use a larger IV where the flags could work.
Though for usize
+isize
, there's actually a way: ptr::invalid
+wrapping_offset
+bare_addr
.
(Dunno if that's anywhere close to being a good idea, though.)
FYI: I put up a PR (like Ralf mentioned in https://github.com/rust-lang/rust/issues/85122#issuecomment-1707805887) to propose stabilizing just unchecked_{add,sub,mul}
: #122520
(Along with some extra documentation text in them.)
EDIT: ...and it's in FCP! Thanks, Amanieu.
Cleaned up this issue a bit, separating out the various feature flags better and including more PRs that were omitted from the list. #121571 should hopefully be merged soon to ensure that all of these have unsafe preconditions checked in debug mode.
For now, we can keep this issue as an aggregate of all the unchecked methods, but if someone feels it'd be more useful separated into separate issues per feature flag, feel free to do so.
Worth an extra update: the unsafe preconditions PR is finally merged! Won't ping everyone who helped get this over the line here since there were quite a few, but this means that in the next nightly all unchecked arithmetic using the methods (not the intrinsics) should be checked in debug mode.
This is a tracking issue for the
unchecked_*
methods on integers.unchecked_math
(stable as of #122520)Steps / History
unchecked_{add,sub,mul}
unchecked_neg
Steps / History
unchecked_neg
unchecked_shifts
Steps / History
unchecked_*
(aggregate)Steps / History
unchecked_{add,sub,mul}
unchecked_neg
Unresolved Questions
exact_div
also get inherent versions?MIN/-1
orMAX+1
, LLVM'sn[us]w
; UB from input range likex/0
orx << -1
; UB from lossy like2/4
or3 >> 1
, LLVM'sexact
)Resolved unresolved questions:
IsWe stabilisedunchecked_*
the best naming for these?unchecked_{add,sub,mul}
already, so, yes.