chorman0773 commented 3 months ago

Proposal

Problem statement

Rust presently has a few options for shifts:

val << bits/val >> bits does a shift by bits with normally checked behaviour for out-of-range shifts,
val.wrapping_shl(bits)/val.wrapping_shr(bits) likewise does a shift by bits, but wraps bits to the width of the type.

(There are also rotates_ which likewise wrap at width boundary).

This has a problem because the wrapping behaviour here is not the intuitive one (as it wraps the shift quantity, not the shift result). Operations may perform a shift with an undetermined quantity that may exceed the width of the type, e.g. to produce a bitmask of possible values.

Motivating examples or use cases

Producing an n-bit mask of an integer type up to n bits:

pub const fn mk_mask64(bits: u32) -> u64{
    (1 << bits)
}

Determining which bit to check for a carry-out of a wide operation

pub fn test_carry_bit(val: u64, bits: u32) -> u64{
    val & (1 << bits)
}

Both of these functions are ones I'm currently using in an architecture emulator which does native integer operations of varying sizes up to 64-bit.

Solution sketch

The names of the methods are subject to bikeshed, I don't have a good name for them.

impl Int { // `Int` is any signed or unsigned integer type, including `u128`/`i128` and `usize/`isize`/
    pub const fn rangeless_shl(self, width: u32) -> Self;
    pub const fn rangeless_shr(self, width: u32) -> Self;
}

Formally, the behaviour of both of these methods involves evaluating the shift on a type with infinite width (or concretely, a width of u32::MAX + 1), then wrapping the result modulo (2 ^ Self::WIDTH).

Practically this means that:

For a.rangeless_shl(bits), if bits < Self::WIDTH, the result is exactly a << bits. Otherwise the result is 0
For a.rangeless_shr(bits), if bits < Self::WIDTH, the result is exactly a >> bits. Otherwise the result depends on the signedness:
- For unsigned types, the result is 0 like for wrapping_shl, which corresponds with the logical right shift of the value by the full width of the type,
- For signed types, the result is either 0 or -1 depending on whether the value is signed or unsigned, which corresponds to an arithmetic right shift of the value by the full width of the type.

The result does not differ when the shift quantity is exactly Self::WIDTH or is any value that exceeds it.

Alternatives

There are a few ways to implement the rangeless shifts concretely:

self.checked_{shl,shr}(bits).unwrap_or(exceeds_width_result) , or, verbosely:
if bits < Self::WIDTH { self (op) bits} else { exceeds_width_result }

(where exceeds_width_result is the appropriate value for the type and op)

Both versions are verbose and somewhat error prone (particularily for signed rangeless_shr) to write inline.

This could be an external crate, but has one of two major limitations due to current language features

It must be a free function, or
It cannot be made a const fn.

The ergonomics of having a method call, and the ability to do this as const makes me of the opinion that it should be an inherent method of the integer types in the standard library. A standard library function could also potentially benefit from optimizations on architectures where this behaviour is the behaviour of an out-of-range shift. I do not know whether or not the versions written above will do this, on x86 the unsigned versions will optimize to a cmov however.

Links and related work

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

We think this problem seems worth solving, and the standard library might be the right place to solve it.
We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.

cuviper commented 3 months ago

I think this is similar to #230.

(edit: I said "the same as" first, but you want different semantics for oversized shifts)

chorman0773 commented 3 months ago

Yes, in particular I want the over-width left shift to be 0 (rather than !0). I commonly use this operation to, among other things, compute a bitmask for a dynamically provided width, doing a .wrapping_sub(1) after the shift.

m-ou-se commented 3 months ago

Why not change the behaviour of the << and >> operators themselves? (Just have 1u32 << 100 evaluate to 0 rather than the overflow-panic/wrapping behaviour.)

joshtriplett commented 3 months ago

Leaving aside whether Rust would have designed these operators like that pre-1.0 (which I don't think Rust should have), I don't think we should change these operators at this point, because that would make them not match the behavior of any existing CPUs, and require multiple instructions rather than one. (In some cases we could optimize away the additional instructions, but I don't think our default operator semantics should have a mismatch with CPUs.)

kennytm commented 3 months ago

Not sure if you consider nVidia GPUs an "existing CPU" but the PTX instruction set for shl/shr did specify

Shift amounts greater than the register width N are clamped to N.

The wrapping behavior is true for basically every other target though (x86, arm, wasm, sparc, riscv, powerpc, mips, m68k, loongarch64, hexagon, bpf)

BTW I think NVPTX's shf terminology (.wrap, .clamp) is more familiar than the term "rangeless".

impl u64 {
    pub fn wrapping_shl(self, rhs: u32) -> Self {
        self << (rhs & 63)
    }

    pub fn clamping_shl(self, rhs: u32) -> Self { // or `clamped_shl`
        if rhs < 64 { self << rhs } else { 0 }
    }
}

dtolnay commented 3 months ago

We discussed this ACP in today's standard library API meeting. The team members present agreed there is a compelling motivation for the standard library to provide the proposed rangeless shift behavior. Before we accept the new methods, we'd like to ask for some other naming options to be brainstormed.

We were split on whether changing <</>> to perform rangeless shift would be reasonable, but either way this one is a decision for the language team, not the standard library team. If someone is interested in pursuing changing the operators, we would encourage submitting a lang RFC, and we would make sure to consider the state of that RFC as part of the process of stabilizing named methods for rangeless shift.

Commentary opposed to changing the operators:

On all platforms we know of, there is no efficient (single machine instruction) way to implement the new behavior
In practice, the vast majority of shifts are known by the programmer to be in-range, so applying a performance overhead to all shifts is not a good cost/benefit tradeoff
Arithmetic operators should only correspond to things that common architectures natively support

op	Rust expression	X86 asm	AArch64 asm
shift left	`n << amount`	```asm shlx rax, rdi, rsi ```	```asm lsl x0, x0, x1 ```
rangeless shift left	`n.checked_shl(amount)` `.unwrap_or(0)`	```asm xor eax, eax cmp esi, 64 shlx rcx, rdi, rsi cmovb rax, rcx ```	```asm lsl x8, x0, x1 cmp w1, #64 csel x0, x8, xzr, lo ```
shift right	`n >> amount`	```asm shrx rax, rdi, rsi ```	```asm lsr x0, x0, x1 ```
rangeless shift right unsigned	`n.checked_shr(amount)` `.unwrap_or(0)`	```asm xor eax, eax cmp esi, 64 shrx rcx, rdi, rsi cmovb rax, rcx ```	```asm lsr x8, x0, x1 cmp w1, #64 csel x0, x8, xzr, lo ```
rangeless shift right signed	`n.checked_shr(amount)` `.unwrap_or(` `if n >= 0 { 0 } else { -1 }` `)`	```asm cmp esi, 63 mov eax, 63 cmovb eax, esi sarx rax, rdi, rax ```	```asm mov w8, #63 cmp w1, #63 csel w8, w1, w8, lo asr x0, x0, x8 ```

Commentary in favor of changing the operators:

Arithmetic operators should implement whichever variation of the operation is most widely applicable, i.e. use the short symbolic name (as opposed to more verbose named method syntax) for the behavior that is to occur the most frequently in real-world programs
Compilers are very competent at telling when a shift amount is definitely in bounds, and turn rangeless shift into a single machine instruction
The vast majority of shifts that people write are not performance sensitive at the level that a cmov would matter whatsoever, and of the remainder, the vast majority are expected to be proven to be in-bounds by the compiler; the 0.1% left can use a named method for the legacy shift behavior

programmerjake commented 3 months ago

changing the shift operators is not backwards compatible, it changes the release-mode behavior of 1u8 << 8.

cuviper commented 3 months ago

If we changed the operator, we would also need to decide how to handle negative shift values. e.g. I don't think we want rangeless x << -2 to result in 0, so should it now act like x >> 2? Or keep treating that like overflow?

(This isn't a problem for explicit methods with rhs: u32.)

Before we accept the new methods, we'd like to ask for some other naming options to be brainstormed.

How about "unbounded"?

kennytm commented 3 months ago

If you are changing the behavior of << you need to make sure portable_simd (std::simd::Simd) is using the same definition.

Interestingly the x86 SSE2 PSLL[W,D,Q] and AVX VPSLLV[W,D,Q] instructions do clamping natively.

If the value specified by the count operand is greater than 15 (for words), 31 (for doublewords), or 63 (for a quadword), then the destination operand is set to all 0s.

OTOH ARM's vector-variant USHL instruction is still wrapping, consistent with the scalar behavior.

shift = SInt(Elem[operand2, e, esize]<7:0>);

chorman0773 commented 3 months ago

x86-64 will also offer this behaviour (specifically as to left-shift and unsigned right-shift) when the range is known to be <64, as you can just do the shift on a 64-bit register, then use the lower 32-bit, 16-bit, or 8-bit overlaps (though rightshift requires first zero-extending, which is either a movzx for 16-bit and 8-bit, or a plain mov for 32-bit).

scottmcm commented 3 months ago

Given a time machine, I'd absolutely change what Rust 1.0 did here. Good integer behaviour follows the rule from https://github.com/rust-lang/rust/issues/100422#issuecomment-1245864700 -- as though it was done in infinite precision, and you just got the low bits.

That's why uN::ilog2 is better, in my opinion, than uN::leading_zeroes. Similarly why uN::count_ones is better than uN::count_zeroes.

I think it's bad that

let x = 3;
foo(x);
dbg!(x << 16);

does something so drastically different depending on what the argument type of foo happens to be.

Needing a conditional instruction -- notably one that doesn't add any dependencies -- is 100% not an issue. Rust isn't an assembly language; the defaults should be the things that work properly even if it's another instruction, because the fast-but-wrong ones are better as functions, not operators.

x >>= n and for _ in 0..n { x >>= 1 } should be the same, just like x += n and for _ in 0..n { x += 1 } are the same. Anything else is bad.

But I agree that it's probably too late to change it now. (At least, it'd need a very long transition period, probably over multiple editions.)

Spitballing names:

proper_sh[lr]
accurate_sh[lr]
correct_sh[lr]
consistent_sh[lr]

pitaj commented 3 months ago

Some more options

truncating_
infinite_
winding_
modular_

joshtriplett commented 2 months ago

We discussed this in today's @rust-lang/libs-api meeting. We ended up settling on @cuviper's suggestion of unbounded_shl/unbounded_shr as the least confusing option. The semantic is that the base shl/shr have a bounded shift count, and the unbounded_ variants have no bound on the shift count.

joshtriplett commented 2 months ago

Marking this as accepted, because we're accepting the functionality proposed by ACP, with the names changed to unbounded_*.

rust-lang / libs-team

Rangeless Shifts #428

Proposal

Problem statement

Motivating examples or use cases

Solution sketch

Alternatives

Links and related work

What happens now?

Possible responses