rust-lang / libs-team

The home of the library team
Apache License 2.0
116 stars 18 forks source link

Rangeless Shifts #428

Closed chorman0773 closed 3 weeks ago

chorman0773 commented 1 month ago

Proposal

Problem statement

Rust presently has a few options for shifts:

(There are also rotates_ which likewise wrap at width boundary).

This has a problem because the wrapping behaviour here is not the intuitive one (as it wraps the shift quantity, not the shift result). Operations may perform a shift with an undetermined quantity that may exceed the width of the type, e.g. to produce a bitmask of possible values.

Motivating examples or use cases

Producing an n-bit mask of an integer type up to n bits:

pub const fn mk_mask64(bits: u32) -> u64{
    (1 << bits)
}

Determining which bit to check for a carry-out of a wide operation

pub fn test_carry_bit(val: u64, bits: u32) -> u64{
    val & (1 << bits)
}

Both of these functions are ones I'm currently using in an architecture emulator which does native integer operations of varying sizes up to 64-bit.

Solution sketch

The names of the methods are subject to bikeshed, I don't have a good name for them.

impl Int { // `Int` is any signed or unsigned integer type, including `u128`/`i128` and `usize/`isize`/
    pub const fn rangeless_shl(self, width: u32) -> Self;
    pub const fn rangeless_shr(self, width: u32) -> Self;
}

Formally, the behaviour of both of these methods involves evaluating the shift on a type with infinite width (or concretely, a width of u32::MAX + 1), then wrapping the result modulo (2 ^ Self::WIDTH).

Practically this means that:

The result does not differ when the shift quantity is exactly Self::WIDTH or is any value that exceeds it.

Alternatives

There are a few ways to implement the rangeless shifts concretely:

(where exceeds_width_result is the appropriate value for the type and op)

Both versions are verbose and somewhat error prone (particularily for signed rangeless_shr) to write inline.

This could be an external crate, but has one of two major limitations due to current language features

  1. It must be a free function, or
  2. It cannot be made a const fn.

The ergonomics of having a method call, and the ability to do this as const makes me of the opinion that it should be an inherent method of the integer types in the standard library. A standard library function could also potentially benefit from optimizations on architectures where this behaviour is the behaviour of an out-of-range shift. I do not know whether or not the versions written above will do this, on x86 the unsigned versions will optimize to a cmov however.

Links and related work

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

Second, if there's a concrete solution:

cuviper commented 1 month ago

I think this is similar to #230.

(edit: I said "the same as" first, but you want different semantics for oversized shifts)

chorman0773 commented 1 month ago

Yes, in particular I want the over-width left shift to be 0 (rather than !0). I commonly use this operation to, among other things, compute a bitmask for a dynamically provided width, doing a .wrapping_sub(1) after the shift.

m-ou-se commented 1 month ago

Why not change the behaviour of the << and >> operators themselves? (Just have 1u32 << 100 evaluate to 0 rather than the overflow-panic/wrapping behaviour.)

joshtriplett commented 1 month ago

Leaving aside whether Rust would have designed these operators like that pre-1.0 (which I don't think Rust should have), I don't think we should change these operators at this point, because that would make them not match the behavior of any existing CPUs, and require multiple instructions rather than one. (In some cases we could optimize away the additional instructions, but I don't think our default operator semantics should have a mismatch with CPUs.)

kennytm commented 1 month ago

Not sure if you consider nVidia GPUs an "existing CPU" but the PTX instruction set for shl/shr did specify

Shift amounts greater than the register width N are clamped to N.

The wrapping behavior is true for basically every other target though (x86, arm, wasm, sparc, riscv, powerpc, mips, m68k, loongarch64, hexagon, bpf)

BTW I think NVPTX's shf terminology (.wrap, .clamp) is more familiar than the term "rangeless".

impl u64 {
    pub fn wrapping_shl(self, rhs: u32) -> Self {
        self << (rhs & 63)
    }

    pub fn clamping_shl(self, rhs: u32) -> Self { // or `clamped_shl`
        if rhs < 64 { self << rhs } else { 0 }
    }
}
dtolnay commented 1 month ago

We discussed this ACP in today's standard library API meeting. The team members present agreed there is a compelling motivation for the standard library to provide the proposed rangeless shift behavior. Before we accept the new methods, we'd like to ask for some other naming options to be brainstormed.

We were split on whether changing <</>> to perform rangeless shift would be reasonable, but either way this one is a decision for the language team, not the standard library team. If someone is interested in pursuing changing the operators, we would encourage submitting a lang RFC, and we would make sure to consider the state of that RFC as part of the process of stabilizing named methods for rangeless shift.

Commentary opposed to changing the operators:

opRust expressionX86 asmAArch64 asm
shift
left
`n << amount` ```asm shlx rax, rdi, rsi ``` ```asm lsl x0, x0, x1 ```
rangeless
shift
left
`n.checked_shl(amount)`
`.unwrap_or(0)`
```asm xor eax, eax cmp esi, 64 shlx rcx, rdi, rsi cmovb rax, rcx ``` ```asm lsl x8, x0, x1 cmp w1, #64 csel x0, x8, xzr, lo ```
shift
right
`n >> amount` ```asm shrx rax, rdi, rsi ``` ```asm lsr x0, x0, x1 ```
rangeless
shift
right
unsigned
`n.checked_shr(amount)`
`.unwrap_or(0)`
```asm xor eax, eax cmp esi, 64 shrx rcx, rdi, rsi cmovb rax, rcx ``` ```asm lsr x8, x0, x1 cmp w1, #64 csel x0, x8, xzr, lo ```
rangeless
shift
right
signed
`n.checked_shr(amount)`
`.unwrap_or(`
`if n >= 0 { 0 } else { -1 }`
`)`
```asm cmp esi, 63 mov eax, 63 cmovb eax, esi sarx rax, rdi, rax ``` ```asm mov w8, #63 cmp w1, #63 csel w8, w1, w8, lo asr x0, x0, x8 ```

Commentary in favor of changing the operators:

programmerjake commented 1 month ago

changing the shift operators is not backwards compatible, it changes the release-mode behavior of 1u8 << 8.

cuviper commented 1 month ago

If we changed the operator, we would also need to decide how to handle negative shift values. e.g. I don't think we want rangeless x << -2 to result in 0, so should it now act like x >> 2? Or keep treating that like overflow?

(This isn't a problem for explicit methods with rhs: u32.)

Before we accept the new methods, we'd like to ask for some other naming options to be brainstormed.

How about "unbounded"?

kennytm commented 1 month ago

If you are changing the behavior of << you need to make sure portable_simd (std::simd::Simd) is using the same definition.

Interestingly the x86 SSE2 PSLL[W,D,Q] and AVX VPSLLV[W,D,Q] instructions do clamping natively.

If the value specified by the count operand is greater than 15 (for words), 31 (for doublewords), or 63 (for a quadword), then the destination operand is set to all 0s.

OTOH ARM's vector-variant USHL instruction is still wrapping, consistent with the scalar behavior.

shift = SInt(Elem[operand2, e, esize]<7:0>);
chorman0773 commented 1 month ago

x86-64 will also offer this behaviour (specifically as to left-shift and unsigned right-shift) when the range is known to be <64, as you can just do the shift on a 64-bit register, then use the lower 32-bit, 16-bit, or 8-bit overlaps (though rightshift requires first zero-extending, which is either a movzx for 16-bit and 8-bit, or a plain mov for 32-bit).

scottmcm commented 1 month ago

Given a time machine, I'd absolutely change what Rust 1.0 did here. Good integer behaviour follows the rule from https://github.com/rust-lang/rust/issues/100422#issuecomment-1245864700 -- as though it was done in infinite precision, and you just got the low bits.

That's why uN::ilog2 is better, in my opinion, than uN::leading_zeroes. Similarly why uN::count_ones is better than uN::count_zeroes.

I think it's bad that

let x = 3;
foo(x);
dbg!(x << 16);

does something so drastically different depending on what the argument type of foo happens to be.

Needing a conditional instruction -- notably one that doesn't add any dependencies -- is 100% not an issue. Rust isn't an assembly language; the defaults should be the things that work properly even if it's another instruction, because the fast-but-wrong ones are better as functions, not operators.

x >>= n and for _ in 0..n { x >>= 1 } should be the same, just like x += n and for _ in 0..n { x += 1 } are the same. Anything else is bad.


But I agree that it's probably too late to change it now. (At least, it'd need a very long transition period, probably over multiple editions.)

Spitballing names:

pitaj commented 1 month ago

Some more options

joshtriplett commented 3 weeks ago

We discussed this in today's @rust-lang/libs-api meeting. We ended up settling on @cuviper's suggestion of unbounded_shl/unbounded_shr as the least confusing option. The semantic is that the base shl/shr have a bounded shift count, and the unbounded_ variants have no bound on the shift count.

joshtriplett commented 3 weeks ago

Marking this as accepted, because we're accepting the functionality proposed by ACP, with the names changed to unbounded_*.