Closed chorman0773 closed 2 months ago
I think this is similar to #230.
(edit: I said "the same as" first, but you want different semantics for oversized shifts)
Yes, in particular I want the over-width left shift to be 0
(rather than !0
). I commonly use this operation to, among other things, compute a bitmask for a dynamically provided width, doing a .wrapping_sub(1)
after the shift.
Why not change the behaviour of the <<
and >>
operators themselves? (Just have 1u32 << 100
evaluate to 0
rather than the overflow-panic/wrapping behaviour.)
Leaving aside whether Rust would have designed these operators like that pre-1.0 (which I don't think Rust should have), I don't think we should change these operators at this point, because that would make them not match the behavior of any existing CPUs, and require multiple instructions rather than one. (In some cases we could optimize away the additional instructions, but I don't think our default operator semantics should have a mismatch with CPUs.)
Not sure if you consider nVidia GPUs an "existing CPU" but the PTX instruction set for shl
/shr
did specify
Shift amounts greater than the register width N are clamped to N.
The wrapping behavior is true for basically every other target though (x86, arm, wasm, sparc, riscv, powerpc, mips, m68k, loongarch64, hexagon, bpf)
BTW I think NVPTX's shf
terminology (.wrap, .clamp) is more familiar than the term "rangeless".
impl u64 {
pub fn wrapping_shl(self, rhs: u32) -> Self {
self << (rhs & 63)
}
pub fn clamping_shl(self, rhs: u32) -> Self { // or `clamped_shl`
if rhs < 64 { self << rhs } else { 0 }
}
}
We discussed this ACP in today's standard library API meeting. The team members present agreed there is a compelling motivation for the standard library to provide the proposed rangeless shift behavior. Before we accept the new methods, we'd like to ask for some other naming options to be brainstormed.
We were split on whether changing <<
/>>
to perform rangeless shift would be reasonable, but either way this one is a decision for the language team, not the standard library team. If someone is interested in pursuing changing the operators, we would encourage submitting a lang RFC, and we would make sure to consider the state of that RFC as part of the process of stabilizing named methods for rangeless shift.
Commentary opposed to changing the operators:
op | Rust expression | X86 asm | AArch64 asm |
---|---|---|---|
shift left | `n << amount` | ```asm shlx rax, rdi, rsi ``` | ```asm lsl x0, x0, x1 ``` |
rangeless shift left |
`n.checked_shl(amount)` `.unwrap_or(0)` | ```asm xor eax, eax cmp esi, 64 shlx rcx, rdi, rsi cmovb rax, rcx ``` | ```asm lsl x8, x0, x1 cmp w1, #64 csel x0, x8, xzr, lo ``` |
shift right | `n >> amount` | ```asm shrx rax, rdi, rsi ``` | ```asm lsr x0, x0, x1 ``` |
rangeless shift right unsigned |
`n.checked_shr(amount)` `.unwrap_or(0)` | ```asm xor eax, eax cmp esi, 64 shrx rcx, rdi, rsi cmovb rax, rcx ``` | ```asm lsr x8, x0, x1 cmp w1, #64 csel x0, x8, xzr, lo ``` |
rangeless shift right signed |
`n.checked_shr(amount)` `.unwrap_or(` `if n >= 0 { 0 } else { -1 }` `)` | ```asm cmp esi, 63 mov eax, 63 cmovb eax, esi sarx rax, rdi, rax ``` | ```asm mov w8, #63 cmp w1, #63 csel w8, w1, w8, lo asr x0, x0, x8 ``` |
Commentary in favor of changing the operators:
cmov
would matter whatsoever, and of the remainder, the vast majority are expected to be proven to be in-bounds by the compiler; the 0.1% left can use a named method for the legacy shift behaviorchanging the shift operators is not backwards compatible, it changes the release-mode behavior of 1u8 << 8
.
If we changed the operator, we would also need to decide how to handle negative shift values. e.g. I don't think we want rangeless x << -2
to result in 0
, so should it now act like x >> 2
? Or keep treating that like overflow?
(This isn't a problem for explicit methods with rhs: u32
.)
Before we accept the new methods, we'd like to ask for some other naming options to be brainstormed.
How about "unbounded"?
If you are changing the behavior of <<
you need to make sure portable_simd
(std::simd::Simd
) is using the same definition.
Interestingly the x86 SSE2 PSLL[W,D,Q]
and AVX VPSLLV[W,D,Q]
instructions do clamping natively.
If the value specified by the count operand is greater than 15 (for words), 31 (for doublewords), or 63 (for a quadword), then the destination operand is set to all 0s.
OTOH ARM's vector-variant USHL
instruction is still wrapping, consistent with the scalar behavior.
shift = SInt(Elem[operand2, e, esize]<7:0>);
x86-64 will also offer this behaviour (specifically as to left-shift and unsigned right-shift) when the range is known to be <64, as you can just do the shift on a 64-bit register, then use the lower 32-bit, 16-bit, or 8-bit overlaps (though rightshift requires first zero-extending, which is either a movzx
for 16-bit and 8-bit, or a plain mov
for 32-bit).
Given a time machine, I'd absolutely change what Rust 1.0 did here. Good integer behaviour follows the rule from https://github.com/rust-lang/rust/issues/100422#issuecomment-1245864700 -- as though it was done in infinite precision, and you just got the low bits.
That's why uN::ilog2
is better, in my opinion, than uN::leading_zeroes
. Similarly why uN::count_ones
is better than uN::count_zeroes
.
I think it's bad that
let x = 3;
foo(x);
dbg!(x << 16);
does something so drastically different depending on what the argument type of foo
happens to be.
Needing a conditional instruction -- notably one that doesn't add any dependencies -- is 100% not an issue. Rust isn't an assembly language; the defaults should be the things that work properly even if it's another instruction, because the fast-but-wrong ones are better as functions, not operators.
x >>= n
and for _ in 0..n { x >>= 1 }
should be the same, just like x += n
and for _ in 0..n { x += 1 }
are the same. Anything else is bad.
But I agree that it's probably too late to change it now. (At least, it'd need a very long transition period, probably over multiple editions.)
Spitballing names:
proper_sh[lr]
accurate_sh[lr]
correct_sh[lr]
consistent_sh[lr]
Some more options
truncating_
infinite_
winding_
modular_
We discussed this in today's @rust-lang/libs-api meeting. We ended up settling on @cuviper's suggestion of unbounded_shl
/unbounded_shr
as the least confusing option. The semantic is that the base shl/shr have a bounded shift count, and the unbounded_
variants have no bound on the shift count.
Marking this as accepted, because we're accepting the functionality proposed by ACP, with the names changed to unbounded_*
.
Proposal
Problem statement
Rust presently has a few options for shifts:
val << bits
/val >> bits
does a shift bybits
with normally checked behaviour for out-of-range shifts,val.wrapping_shl(bits)
/val.wrapping_shr(bits)
likewise does a shift bybits
, but wraps bits to the width of the type.(There are also
rotates_
which likewise wrap at width boundary).This has a problem because the wrapping behaviour here is not the intuitive one (as it wraps the shift quantity, not the shift result). Operations may perform a shift with an undetermined quantity that may exceed the width of the type, e.g. to produce a bitmask of possible values.
Motivating examples or use cases
Producing an
n-bit
mask of an integer type up ton
bits:Determining which bit to check for a carry-out of a wide operation
Both of these functions are ones I'm currently using in an architecture emulator which does native integer operations of varying sizes up to 64-bit.
Solution sketch
The names of the methods are subject to bikeshed, I don't have a good name for them.
Formally, the behaviour of both of these methods involves evaluating the shift on a type with infinite width (or concretely, a width of
u32::MAX + 1
), then wrapping the result modulo(2 ^ Self::WIDTH)
.Practically this means that:
a.rangeless_shl(bits)
, ifbits < Self::WIDTH
, the result is exactlya << bits
. Otherwise the result is0
a.rangeless_shr(bits)
, ifbits < Self::WIDTH
, the result is exactlya >> bits
. Otherwise the result depends on the signedness:unsigned
types, the result is0
like forwrapping_shl
, which corresponds with the logical right shift of the value by the full width of the type,signed
types, the result is either0
or-1
depending on whether the value is signed or unsigned, which corresponds to an arithmetic right shift of the value by the full width of the type.The result does not differ when the shift quantity is exactly
Self::WIDTH
or is any value that exceeds it.Alternatives
There are a few ways to implement the rangeless shifts concretely:
self.checked_{shl,shr}(bits).unwrap_or(exceeds_width_result)
, or, verbosely:if bits < Self::WIDTH { self (op) bits} else { exceeds_width_result }
(where
exceeds_width_result
is the appropriate value for the type and op)Both versions are verbose and somewhat error prone (particularily for signed
rangeless_shr
) to write inline.This could be an external crate, but has one of two major limitations due to current language features
const fn
.The ergonomics of having a method call, and the ability to do this as
const
makes me of the opinion that it should be an inherent method of the integer types in the standard library. A standard library function could also potentially benefit from optimizations on architectures where this behaviour is the behaviour of an out-of-range shift. I do not know whether or not the versions written above will do this, on x86 the unsigned versions will optimize to acmov
however.Links and related work
What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
Second, if there's a concrete solution: