clarfonthey commented 1 year ago

Proposal

Note: this is a retroactive ACP for a feature that has already exists in unstable and is tracked at rust-lang/rust#85532. It's not 100% complete, but exists to start discussing this feature through the new ACP process rather than in the tracking issue.

For the sake of brevity, the term "big integer" will be shortened to "bigint" and refers to aggregate integer types, usually those represented by slices of primitive integer types stored in some heap vector. It includes the standard "big integer" types which represent unbounded-size integers, but also aggregate integer types needed to implement larger integer types like those needed for many cryptographical operations.

Problem statement

The standard library does not offer its own big-integer types, which is left as a task for the larger ecosystem. However, the actual primitive operations required to implement big-integer types are supported by hardware instructions on many architectures, and can be added onto integers without committing to any specific API for the big-integers themselves.

These operations include things like:

"Carrying addition," where a carry bit is also added to the two operands and returned in the output. This is known as a "full adder" in hardware, and referred to by names like "carrying add" in assembly instructions. Similar operations also exist for subtraction.
"Widening multiplication," where the low-order bits of a double-wide multiplication result are included in addition to the high-order bits. This also includes carry operations where a third integer operand is added to the result.
"Widening shifts," where bits that shift past the end of an integer operand are returned as a separate result.
etc.

Motivating examples or use cases

Although it's understandable that the standard library does not offer a big-integer type due to the complexity and variety of the potential APIs, the underlying primitive operations needed for them are a good fit for the standard library:

They sometimes require inline assembly to ensure efficient code generation, which could be accomplished by compiler intrinsics
Due to the nature of these architecture-specific optimisations, many bigint crates will have limited portability, forcing users to choose between an API they like and the support they need in some cases
Most of these operations are conceptually very simple extensions of existing arithmetic, which fit nicely alongside operations like overflowing_add, etc.
While many of these operations can be coded in a way that will be automatically optimised correctly, the standard library can also add codegen tests to avoid regressions due to compiler changes. Downstream crates cannot easily do this without substantial work, whereas the rust project already has this infrastructure in place.

The main benefit of this is so that crates in the ecosystem like num-bigint and more can focus on offering a specific API, rather than also having to juggle the optimisations needed for an efficient implementation. While many of these crates will inevitably still resort to inline assembly, they'll be able to fall back to efficient operations in libstd without having to worry about sacrificing too much performance on other platforms.

Solution sketch

This is the existing implementation:

// On unsigned integers:

/// `self + rhs + carry` (full adder)
fn carrying_add(self, rhs: Self, carry: bool) -> (Self, bool);

/// `self - rhs - carry` (full "subtractor")
fn borrowing_sub(self, rhs: Self, carry: bool) -> (Self, bool);

/// `self * rhs + carry` (multiply-accumulate)
fn carrying_mul(self, rhs: Self, carry: Self) -> (Self, Self);

/// `self * rhs` (wide multiplication, same as `self.carrying_mul(rhs, 0)`)
fn widening_mul(self, rhs: Self) -> (Self, Self);

// On signed integers:

/// `self + rhs + carry` (full adder)
fn carrying_add(self, rhs: Self, carry: bool) -> (Self, bool);

/// `self - rhs - carry` (full "subtractor")
fn borrowing_sub(self, rhs: Self, carry: bool) -> (Self, bool);

Note that this will likely be augmented to include more operations (like shifts, division, and remainder), and potentially resolve existing potential issues (the term "borrowing" instead of "carrying" for everything, the return type of widening operations, etc.).

Alternatives

The primary alternative to offering this API is to not offer it. There are already some architecture-specific intrinsics available like core::arch::x86::_addcarry_u64, but this only avoids inline assembly rather than avoiding architecture-specific optimisations altogether.

Links and related work

You can find most of this in the tracking issue: rust-lang/rust#85532.

What happens now?

This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

We think this problem seems worth solving, and the standard library might be the right place to solve it.
We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.

pitaj commented 1 year ago

What is the motivation for putting the carry as the second field in the tuple? My intuition is that it should be in the first position, as that's the most significant position in numbers.

Also, it seems odd to return bool when most use cases are just going to convert it back to a number anyways. Some extra > 0 conditions are preferable to needing usize::from in a bunch of places.

tarcieri commented 1 year ago

My intuition is that it should be in the first position, as that's the most significant position in numbers.

When working with a bigint with limbs with a little endian ordering (i.e. limbs arranged least to most significant) having the carry in the second position of the tuple feels more natural and translates better to having similar carrying arithmetic on the bigints themselves.

programmerjake commented 1 year ago

other operations we'll likely want that i don't think were mentioned are unsigned-signed widening/carrying multiply

clarfonthey commented 1 year ago

What is the motivation for putting the carry as the second field in the tuple? My intuition is that it should be in the first position, as that's the most significant position in numbers.

A couple, actually. First, this mirrors the existing APIs for methods like overflowing_add, where the carry flag is returned as the second argument. Secondly, as @tarcieri mentions, it's because the order they're processed is more natural to work with than the order of significance. When working with these values, you will constantly change what is referenced for the least-significant part, but track the carry bit throughout the entire operation.

Also, it seems odd to return bool when most use cases are just going to convert it back to a number anyways. Some extra > 0 conditions are preferable to needing usize::from in a bunch of places.

This is also incorrect for a few reasons. Firstly, the carry bit is passed among operations, so you should never have to do usize::from; you start with the carry set to 0 (here, false), and then reuse the new version of the carry bit throughout each intermediate step. Additionally, since these exist primarily for performance (optimised to a single instruction in nearly all cases), adding additional checks would (IMHO) defeat the purpose of them. Plus, most architectures (at least that I'm familiar with) will usually use some sort of flag to reflect the carry bit instead of a dedicated register, and this just further reinforces this in the method.

Also adding this one since it got posted mid-writing this:

other operations we'll likely want that i don't think were mentioned are unsigned-signed widening/carrying multiply

I'm not 100% sure of the motivation for this, unless we actually change the representation of the least-significant part of a signed multiplication to be unsigned, which is still undecided. I personally am a fan of keeping all of the parts of a signed operation signed, since values that don't overflow are meant to be interpreted as signed. Actually, in a varint format I've been working on (code not linkable atm due to unrelated issues), I personally use signed integers throughout all values inside a two's complement slice, since it just makes more sense to use a slice instead of tuples.

programmerjake commented 1 year ago

other operations we'll likely want that i don't think were mentioned are unsigned-signed widening/carrying multiply

I'm not 100% sure of the motivation for this,

the motivation is for when multiplying twos complement bigints, the most significant limb is signed and the rest are unsigned, therefore we need unsigned-signed multiplication.

carrying signed-unsigned mul would be like so (idk which would be self, so leaving that out): based off of maddedus -- a draft instruction for PowerISA (which I designed as part of Libre-SOC) https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=76bba313e12c29dfa340d8953e231674

/// 8-bit version for ease of exposition
pub fn mul_carrying_unsigned_signed(a: u8, b: i8, carry: i8) -> (u8, i8) {
    let prod = a as i16 * b as i16; // can't overflow -- in -0x7F80..=0x7E81
    let sum = prod + carry as i16; // can't overflow -- in -0x8000..=0x7F00

    // equivalent to carry = sum.div_floor(0x100)
    // equivalent to result = sum.mod_floor(0x100)
    let carry = sum >> 8; // in -0x80..=0x7F
    let result = sum & 0xFF; // in 0x0..=0xFF

    // result is unsigned because twos complement integers are defined
    // as modified unsigned integers by reassigning the MSB's value as
    // -2^(width-1) rather than 2^(width-1), all lower bits still are
    // positive and *not* signed.
    (result as u8, carry as i8)
}

// example usage (again 8-bit for exposition):

/// twos complement bigint
pub struct BigInt {
    pub limbs: Vec<u8>, // little-endian
    pub msb_limb: i8,   // only the msb is a sign bit so only msb_limb is signed
}

pub fn mul_biguint_by_int(biguint_limbs: &[u8], int: i8, carry_in: i8) -> BigInt {
    let mut carry = carry_in;
    let mut limbs = vec![];
    for limb in biguint_limbs {
        let (result, carry_out) = mul_carrying_unsigned_signed(*limb, int, carry);
        limbs.push(result);
        carry = carry_out;
    }
    BigInt {
        limbs,
        msb_limb: carry,
    }
}

#[test]
fn test_it() {
    let a = 0x1234_5678_9ABC_DEF0_i64.to_le_bytes();
    let b = -0x10i8;
    let carry_in = -0x42i8;
    let result = mul_biguint_by_int(&a, b, carry_in);
    let result_u64 = u64::from_le_bytes(result.limbs.try_into().unwrap());
    let expected = -0x1_2345_6789_ABCD_EF42_i128;
    assert_eq!(result.msb_limb, (expected >> 64) as i8);
    assert_eq!(result_u64, expected as u64);
}

programmerjake commented 1 year ago

note that RISC-V has a multiply unsigned-signed mulhsu instruction specifically intended for bigints.

clarfonthey commented 1 year ago

I had no idea that was a common implementation, although I guess that I haven't really had too much exposure to the wider variety of implementations. My assumption was that everything either had a separate sign bit or was two's complement throughout, not that there would be these sorts of hybrid implementations.

I still question whether we want to make the signed-signed multiplication return (signed, unsigned) instead of (signed, signed), but these sorts of implementations definitely call out the need to offer signed-unsigned multiplication regardless.

programmerjake commented 1 year ago

an example of why signed outputs should be represented as a signed high half and unsigned low half: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=d976dae1f7c25f0c8d993ef58e3235bf

#[test]
fn test_fn() {
    let wide: i64 = (-1 << 36) + (1 << 31);

    let hi_s = (wide >> 32) as i32;

    // correct since `wide`'s sign bit is not in the low half
    let lo_u = wide as u32;
    let lo_s = wide as i32; // incorrect

    let hi_value = i64::from(hi_s) << 32;

    // from is value-preserving
    let lo_u_value = i64::from(lo_u);
    let lo_s_value = i64::from(lo_s);

    // lo_u correctly represents the value of the low half
    assert_eq!(hi_value + lo_u_value, wide);

    // lo_s *doesn't* correctly represent the value of the low half
    assert_ne!(hi_value + lo_s_value, wide);
}

programmerjake commented 1 year ago

I had no idea that was a common implementation, although I guess that I haven't really had too much exposure to the wider variety of implementations. My assumption was that everything either had a separate sign bit or was two's complement throughout, not that there would be these sorts of hybrid implementations.

unsigned bigint signed scalar can be used as part of a 2's complement bigint bigint multiply

m-ou-se commented 1 year ago

Closing this, since there is already a tracking issue for discussing the (unstable) feature: https://github.com/rust-lang/rust/issues/85532

If there are any thoughts on the thread here that are not reflected on the tracking issue, please make sure it's discussed there. Thank you.

rust-lang / libs-team

ACP: Bigint helper methods #228

Proposal

Problem statement

Motivating examples or use cases

Solution sketch

Alternatives

Links and related work

What happens now?

Possible responses