Tracking Issue for bigint helper methods

clarfonthey commented 3 years ago

Feature gate: #![feature(bigint_helper_methods)]

This is a tracking issue for the following methods on integers:

carrying_add
borrowing_sub
carrying_mul
widening_mul

These methods are intended to help centralise the effort required for creating efficient big integer implementations, by offering a few methods which would otherwise require special compiler intrinsics or custom assembly code in order to do efficiently. They do not alone constitute big integer implementations themselves, but are necessary building blocks for a larger implementation.

Public API

// On unsigned integers:

/// `self + rhs + carry` (full adder)
const fn carrying_add(self, rhs: Self, carry: bool) -> (Self, bool);

/// `self - rhs - carry` (full "subtractor")
const fn borrowing_sub(self, rhs: Self, carry: bool) -> (Self, bool);

/// `self * rhs + carry` (multiply-accumulate)
const fn carrying_mul(self, rhs: Self, carry: Self) -> (Self, Self);

/// `self * rhs` (wide multiplication, same as `self.carrying_mul(rhs, 0)`)
const fn widening_mul(self, rhs: Self) -> (Self, Self);

// On signed integers:

/// `self + rhs + carry` (full adder)
const fn carrying_add(self, rhs: Self, carry: bool) -> (Self, bool);

/// `self - rhs - carry` (full "subtractor")
const fn borrowing_sub(self, rhs: Self, carry: bool) -> (Self, bool);

Steps / History

[x] widening_mul RFC: rust-lang/rfcs#2417
[x] Initial implementation of carrying_add, borrowing_sub, carrying_mul, and widening_mul #85017
[x] Remove implementations on signed types #90848 per discussion in #90541
[x] Add new carrying_add and borrowing_sub on signed types: https://github.com/rust-lang/rust/pull/93873
[ ] Add assembly tests to ensure this keeps doing the right thing
[ ] Clarify documentation ("without the ability to overflow" can be confusing.)
[ ] Final commenting period (FCP)
[ ] Stabilization PR

Unresolved Questions

Should these be implemented using compiler intrinsics? LLVM currently has no equivalents, so, we'd have to custom-build some.
Should an alternative API be provided for widening_mul that simply returns the next-larger type? What would we do for u128/i128?
What should the behaviour be for signed integers? Should there be implementations for signed integers at all?
Is the "borrowing" terminology worth it for subtraction, or should we simply call that "carrying" as well for consistency?
Are there other methods that should be added in addition to the existing ones?

leonardo-m commented 3 years ago

Are there other methods that should be added in addition to the existing ones?

I'd like a mul_mod, as shown in #85017, because I think you can't implement it efficiently without asm and it's a basic block for power_mod and other things.

clarfonthey commented 3 years ago

Another set of methods that could be useful that I'll probably offer implementations for at some point:

/// `(self << rhs) | carry`
fn carrying_shl(self, rhs: u32, carry: Self) -> (Self, Self); 

/// `(self >> rhs) | carry`
fn borrowing_shr(self, rhs: u32, carry: Self) -> (Self, Self);

/// `self << rhs`
fn widening_shl(self, rhs: u32) -> (Self, Self);

/// `self >> rhs`
fn widening_shr(self, rhs: u32) -> (Self, Self);

Essentially, return the two halves of a rotation, i.e. x.widening_shl(y) is the same as (x << y, x >> (BITS - y)) and similarly for widening_shr. Not sure whether they should allow rhs == BITS or not, but presumably they wouldn't for consistency with existing shift methods.

clarfonthey commented 3 years ago

From @scottmcm in the original PR:

Some prior art I happened upon: https://docs.rs/cranelift-codegen/0.74.0/cranelift_codegen/ir/trait.InstBuilder.html#method.isub_bin
Same as isub with an additional borrow flag input. Computes:
   a = x - (y + b_{in}) \pmod 2^B

photino commented 3 years ago

Why don't we add carrying_mul and widening_mul for i128/u128 as well?

clarfonthey commented 3 years ago

Mostly effort implementing them efficiently. In the meantime, you can do it with four calls to the u64 version. Or three if you want to be fancy.

RalfJung commented 3 years ago

fn borrowing_sub(self, rhs: Self, carry: bool) -> (Self, bool);

I was very confused by this function name at first, since borrowing in Rust usually refers to references. I am not a native speaker, but I do formal mathematical work in English professionally, and yet I never before heard the term "borrowing" in the context of subtraction. So I think this, at least, needs some explanation in the docs. (I would have expected something like carrying_sub, but maybe that is nonsense for a native speaker.)

The current docs for some of the other methods could probably also be improved: they talk about not having the "ability to overflow", which makes it sound like not overflowing is a bad thing.

AaronKutch commented 3 years ago

The word borrow here comes from the terminology for a full subtractor. I am thinking that maybe the borrowing_sub function could be removed altogether. The same effect that borrowing_sub has can be obtained from carrying_add by making the first carrying_add in the chain have a set carry bit, and then bitnot every rhs. This fact could be put in the documentation of carrying_add.

clarfonthey commented 3 years ago

The word borrow here comes from the terminology for a full subtractor. I am thinking that maybe the borrowing_sub function could be removed altogether. The same effect that borrowing_sub has can be obtained from carrying_add by making the first carrying_add in the chain have a set carry bit, and then bitnot every rhs. This fact could be put in the documentation of carrying_add.

Considering how the primary goal of these methods is to be as efficient as possible, usually optimising down to a single instruction, I don't think it'd be reasonable to just get rid of subtraction in favour of telling everyone to use addition instead. Definitely open to changing the name, though.

AaronKutch commented 3 years ago

These helper methods will not be very useful to me unless they are implemented for every kind of integer. Here is an implementation for a widening multiplication-addition for u128:

/// Extended multiply-addition of `(lhs * rhs) + add`. The result is returned as a tuple of the wrapping part and the
/// overflow part. No numerical overflow is possible even if all three arguments are set to their max values.
pub const fn widen_mul_add(lhs: u128, rhs: u128, add: u128) -> (u128, u128) {
    //                       [rhs_hi]  [rhs_lo]
    //                       [lhs_hi]  [lhs_lo]
    //                     X___________________
    //                       [------tmp0------]
    //             [------tmp1------]
    //             [------tmp2------]
    //     [------tmp3------]
    //                       [-------add------]
    // +_______________________________________
    //                       [------sum0------]
    //     [------sum1------]

    let lhs_lo = lhs as u64;
    let rhs_lo = rhs as u64;
    let lhs_hi = (lhs.wrapping_shr(64)) as u64;
    let rhs_hi = (rhs.wrapping_shr(64)) as u64;
    let tmp0 = (lhs_lo as u128).wrapping_mul(rhs_lo as u128);
    let tmp1 = (lhs_lo as u128).wrapping_mul(rhs_hi as u128);
    let tmp2 = (lhs_hi as u128).wrapping_mul(rhs_lo as u128);
    let tmp3 = (lhs_hi as u128).wrapping_mul(rhs_hi as u128);
    // tmp1 and tmp2 straddle the boundary. We have to handle three carries
    let (sum0, carry0) = tmp0.overflowing_add(tmp1.wrapping_shl(64));
    let (sum0, carry1) = sum0.overflowing_add(tmp2.wrapping_shl(64));
    let (sum0, carry2) = sum0.overflowing_add(add as u128);
    let sum1 = tmp3
        .wrapping_add(tmp1.wrapping_shr(64))
        .wrapping_add(tmp2.wrapping_shr(64))
        .wrapping_add(carry0 as u128)
        .wrapping_add(carry1 as u128)
        .wrapping_add(carry2 as u128);
    (sum0, sum1)
}

I have tested this with my crate awint.

edit: There is a version of this that uses the Karatsuba trick to use 3 multiplications instead of 4, but it incurs extra summations, branches, and is not as parallel. For typical desktop processors the above should be the fastest.

clarfonthey commented 3 years ago

I would make a PR for that.

AaronKutch commented 3 years ago

Some alternative signatures include u128::widen_mul_add(lhs, rhs, add), lhs.widen_mul_add(rhs, add), or add.widen_mul_add(lhs, rhs). In awint my general purpose mul-add function is mul_add_triop which uses the third signature but takes self mutably and add-assigns lhs * rhs. I'm not sure which is best.

AaronKutch commented 3 years ago

I would also change up the documentation headers for the carrying_add function to say

Extended addition of `self + rhs + carry`. The booleans are interpreted as a single bit
integer of value 0 or 1. If unsigned overflow occurs, then the boolean in the tuple
returns 1. The output carry can be chained into the input carry of another carrying add,
which allows for arbitrarily large additions to be calculated.

I specifically note unsigned overflow, because that happens for both signed and unsigned integers because of how two's complement works.

AaronKutch commented 3 years ago

borrowing_sub should be left in with its naming, but its documentation could be

Extended subtraction of `self - rhs - borrow`. The "borrowing" here refers to borrowing in the full subtractor sense.
The booleans are interpreted as a single bit integer of value 0 or 1. If unsigned overflow occurs, then the boolean
in the tuple returns 1. The output carry can be chained into the input carry of another borrowing subtract,
which allows for arbitrarily large subtraction to be calculated.

tspiteri commented 3 years ago

I specifically note unsigned overflow, because that happens for both signed and unsigned integers because of how two's complement works.

But unsigned overflow and signed overflow are different. For example, on x86_64, while unsigned and signed integers share addition and subtraction instructions, unsigned overflow is detected using the carry flag while signed overflow is detected using the overflow flag.

As a concrete example: 127i8 + 1 causes signed overflow but not unsigned overflow. So the carry flag should be false/0.

Edit: I think I had misread your comment and thought the middle part of your comment was the current doc, not your suggestion, so it looks like I completely misinterpreted your final comment.

AaronKutch commented 3 years ago

Yes signed and unsigned overflow are different, but the carrying_add as implemented for unsigned and signed integers both use unsigned overflow because of how two's complement carrying works. Someone using i64::carrying_add might think that the carry out bit follows the bit of i64::overflowing_add when in actuality it is following u64::overflowing_add. So in the documentation I would put emphasis on _unsigned_ overflow.

clarfonthey commented 3 years ago

I think all of these are good suggestions, and like mentioned earlier, these changes definitely should go in a PR if you have the time. I think one important thing to note is that so far the APIs here seem good, but the documentation definitely could use some work. Although if there's a bigger case for changing the subtraction behaviour to be more in line with what's expected (the existing behaviour is mostly modelled after the x86 instructions adc and sbb), then I'm for that.

That said, the main goal is to make it relatively painless to write correct code that compiles down to the right instructions in release mode, so, I would say we should make sure that happens regardless of what's done. I would have added an explicit test for that but I honestly don't know how.

Stovent commented 3 years ago

I've been using these functions along with the overflowing_add/overflowing_sub functions to implement a virtual machine (for the Motorola 68000 architecture). The overflowing functions are great for this kind of application, but I have a few comments to add on carrying_add and borrowing_sub so that they can fit my needs.

First, I agree that it would be better if carrying_add and borrowing_sub had a more similar name. Based on my experience with the M68000 arch that has a dedicated extend bit for this application, I would suggest the names extended_add and extended_sub or extending_add and extending_sub.

Also based on my experience with implementing M68000 instructions, I think it would be better if carrying_add and borrowing_sub had the same overflowing behaviour as their overflowing equivalents. This would also be useful to correctly detect signed overflow in big int contexts.

For instance, here is a possible i16 implementation using only 8-bits values with carrying_add, where signed carrying_add overflow has to be detected (the principle applies to any big int type) :

#![feature(bigint_helper_methods)]
struct I16 {
    /// Low-order bytes has to be unsigned.
    pub low: u8,
    /// Most Significant Byte has to be of the same signedness as the desired type.
    /// So u8 if I implement a U16, i8 if I implement I16.
    pub high: i8,
}

impl I16 {
    /// Adds `rhs` to `self` and returns true if signed overflow occurs, false otherwise.
    pub fn overflowing_add(&mut self, rhs: Self) -> bool {
        // For the low-order bytes, unsigned overflow has to be used.
        let (low_res, low_carry) = self.low.carrying_add(rhs.low, false);

        // To  correctly detect signed overflow, carrying_add should also detect signed-overflow with signed data,
        // Which is currently not the case. This returns (0x80, false) whereas a signed overflow occured.
        let (high_res, high_carry) = self.high.carrying_add(rhs.high, low_carry);

        self.low = low_res;
        self.high = high_res;
        high_carry
    }
}

fn main() {
    let (desired_res, desired_overflow) = 0x7FFFi16.overflowing_add(1); // returns (i16::MIN, true)
    let mut my_i16 = I16 {
        low: 0xFF,
        high: 0x7F,
    };
    let one = I16 {
        low: 1,
        high: 0,
    };
    let my_i16_overflow = my_i16.overflowing_add(one);

    let res = (my_i16.high as i16) << 8 | my_i16.low as i16;
    assert_eq!(desired_res, res); // Works
    assert_eq!(desired_overflow, my_i16_overflow); // Fails with left == true and right == false
}

Obviously, this example should also applies to borrowing_sub. It is also noted as a bug in a recent issue : #90541

Stovent commented 3 years ago

I'd like to further explain what the functions behaviors should be according to my experience, and to provide another use case.

First, there is the question of intermediate overflow that may occur in some operations. For instance, (-128i8).carrying_add(-1, true) has an intermediate overflow, because the first computation does -128 + (-1) = -129, which overflows to 127, but then adding the carry bit gives 127 + 1that which overflows back to -128, or -129 + 1 = -128. In this case there are intermediate overflows, and my experience with M68000 makes me think they should not be returned by the functions, so (-128i8).carrying_add(-1, true) should return (-128, false). The reasoning behind that is because signed overflow occurs when the result of a computation cannot be represented inside it's size. Here -128 perfectly fit inside an 8-bits number so the result did not overflowed.

These functions are also convenient to emulate big integers in virtual machines. Let's take an example with the ADDX instruction of the M68000, which is designed to implement big integers on the CPU (the example simplifies the architecture details to only keep what is important in our case): The instruction is defined as

Description: Adds the source operand and the extend bit to the destination operand and stores the result in the destination location.

Condition Codes:
- eXtend: Set the same as the carry bit.
- oVerflow: Set if an overflow occurs; cleared otherwise.
- Carry: Set if a carry is generated; cleared otherwise.
- [...]

The operation adds the destination operand to the source operand, and then add the eXtend bit from the condition codes to the previous result. The condition codes are individual bits used to indicate what happened during the last operation. The eXtend bit is what allows the CPU to implement big integers. oVerflow is set when a signed overflow occurs. Carry is set when an unsigned overflow occurs.

Using the big int helper functions makes the implementation of this instruction easy and less error-prone because I don't have to calculate the condition codes myself, the function does it for me. A typical implementation would look like that :

// Here `memory.get_byte(some_address)` is used to abstract the architecture details.
let src: u8 = memory.get_byte(some_address);
let dst: u8 = memory.get_byte(some_address);

// Here is where the signed functions are useful, they computes the (signed) overflow bit.
// `codes` is the variable containing the conditions codes described upper.
let (result, v) = (src as i8).carrying_add(dst, codes.x);
// Use the unsigned big int function to compute the carry (unsigned overflow) bit.
let (_, c) = src.carrying_add(dst, codes.x);

// Now computes the condition codes using the result value and the 
codes.x = c;
codes.v = v;
codes.c = c;
// Other codes [...]

// Write back the result
memory.set_byte(some_address, result);

Stovent commented 2 years ago

This post provides a concrete example on why intermediate overflow should not occur. It reuses the code in my first post here. TL;DR; doing -32513 + -255 using my implementation example would give the correct answer -32768, but having intermediate would indicate that the operation overflowed while it doesn't.

#![feature(bigint_helper_methods)]
struct I16 {
    /// Low-order bytes has to be unsigned.
    pub low: u8,
    /// Most Significant Byte has to be of the same signedness as the desired type.
    /// So u8 if I implement a U16, i8 if I implement I16.
    pub high: i8,
}

impl I16 {
    /// Adds `rhs` to `self` and returns true if signed overflow occurs, false otherwise.
    pub fn overflowing_add(&mut self, rhs: Self) -> bool {
        let (low_res, low_carry) = self.low.carrying_add(rhs.low, false); // Returns (0, true)

        // In the example from the main function, the operation below performes -128 + -1 + 1.
        // Intermediate overflow occurs as -128 + -1 gives -129, which overflow to 127. Then 127 + 1 gives 128, which overflows back to -128.
        // If carrying_add on signed numbers returns true on intermediate overflow, then this function returns true in the example.
        // However, from a i16 point of view, -32_513 + -255 gives -32768, which does not overflow. So intermediate overflow must not be returned.

        // Using overflowing_add as signed bit ints are not implemented anymore.
        let (high_res, high_carry) = self.high.overflowing_add(rhs.high + low_carry as i8); // Returns (-128, false), which is correct.

        self.low = low_res;
        self.high = high_res;
        high_carry
    }
}

fn main() {
    let (desired_res, desired_overflow) = (-32_513i16).overflowing_add(-255); // Returns (i16::MIN, false)
    let mut my_i16 = I16 { // -32_513
        low: 0xFF,
        high: -128,
    };
    let one = I16 { // -255
        low: 1,
        high: -1,
    };
    let my_i16_overflow = my_i16.overflowing_add(one); // If intermediate overflow was returned, the assert below would fail.

    let res = (my_i16.high as i16) << 8 | my_i16.low as i16;
    assert_eq!(desired_res, res); // Works
    assert_eq!(desired_overflow, my_i16_overflow);
}

To resume my ideas:

Rename carrying/borrowing functions with extending (or extended).
Implement these functions on signed types, as I explained why they are useful.
Make the implementations not return true on intermediate overflow (which is currently the case with the unsigned implementations).

I can provide implementations of the add/sub functions with the behaviours I described if necessary.

Also pinging the author of the issue @clarfonthey for visibility (sorry if it is inappropriate).

Stovent commented 2 years ago

I also support the widening_shl/r functions described in https://github.com/rust-lang/rust/issues/85532#issuecomment-846279618 , as they are useful to implement rotate_left/_right on big ints. Maybe they can be added to the public API ?

I don't know how changes like these are accepted to be implemented in the std and implementations can be provided. Should we do a complete RFC resuming all those functions ?

clarfonthey commented 2 years ago

Since we've gotten the go-ahead to start adding these, IMHO, starting with code is probably the best to go anyway. If an RFC is needed that can be done after these functions are usable on nightly, since that'll be when more people are using them and we have a more solid idea of how useful they are.

scottmcm commented 2 years ago

@Stovent Small method additions, especially ones that follow existing patterns, you can generally just send a PR. As a rough heuristic, if it's easier to write the PR than an RFC, then do that -- the exception being things with broad impact, like traits or generally anything that would change conventions for how things are done in the ecosystem.

But if you have any questions, you can always ask on zulip.

cmpute commented 2 years ago

It's very useful to add a mul_mod function to the integer types (the name can be better designed tho). Here's the assembly comparison with a naive implementation in Godbolt%3B%0A++++r%0A%7D%0A%0Apub+unsafe+fn+mul_mod2(this:+u64,+b:+u64,+m:+u64)+-%3E+u64+%7B%0A++++//+there+is+no+unchecked_rem+yet%0A++++((this+as+u128).unchecked_mul(b+as+u128)+%25+m+as+u128)+as+u64%0A%7D%0A'),l:'5',n:'0',o:'Rust+source+%231',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((g:!((h:compiler,i:(compiler:nightly,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:rust,libs:!(),options:'',selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1,tree:'1'),l:'5',n:'0',o:'rustc+nightly+(Rust,+Editor+%231,+Compiler+%231)',t:'0')),k:50,l:'4',m:68.83561643835617,n:'0',o:'',s:0,t:'0'),(g:!((h:output,i:(compiler:1,editor:1,fontScale:14,fontUsePx:'0',tree:'1',wrap:'1'),l:'5',n:'0',o:'Output+of+rustc+nightly+(Compiler+%231)',t:'0')),header:(),l:'4',m:31.164383561643838,n:'0',o:'',s:0,t:'0')),k:50,l:'3',n:'0',o:'',t:'0')),l:'2',n:'0',o:'',t:'0')),version:4)

And it's also very helpful for me if widening_mul or carrying_mul is implemented for u128, which otherwise can be implemented inefficiently like here. It will be nice if we have the support for the largest primitive integer.

elichai commented 2 years ago

FWIW I just found a bug in the following code:

pub fn overflowing_add(mut self, other: Uint) -> (Uint, bool) {
    let mut carry = false;
    let mut carry_out;
    for i in 0..Self::LIMBS {
        (self.0[i], carry_out) = self.0[i].overflowing_add(other.0[i]);
        self.0[i] += carry as u64; 
        carry = carry_out;
    }
    (self, carry)
}

Fixed it (maybe even could've been prevented) by using these provided function:

pub fn overflowing_add(mut self, other: Uint) -> (Uint, bool) {
    let mut carry = false;
    let mut carry_out;
    for i in 0..Self::LIMBS {
        (self.0[i], carry_out) = self.0[i].carrying_add(other.0[i], carry);
        carry = carry_out;
    }
    (self, carry)
}

tspiteri commented 2 years ago

For the signed carrying_add and borrowing_sub that have been added, I was misled by the names. I expect carrying_add-> (i, bool) to return carry as the second tuple field, just like overflowing_add -> (i, bool) returns overflow. Maybe a better name would be overflowing_add_carry/overflowing_sub_borrow, since the returned bool is an overflow flag, not a carry flag.

Mart-Bogdan commented 2 years ago

I'm wondering how it would be implemented on RISC-V? AFAIK this architecture don't have carry flag :-(

tarcieri commented 2 years ago

RISC-V’s add/stlu provide carryout:

https://godbolt.org/z/Y7f5dzj1P

However they have data-dependencies on each other, which means no ILP, unfortunately

scottmcm commented 2 years ago

@Mart-Bogdan I think that's in a way a good example of why this is a good thing to have. I don't want to have to know the magic incantation that produces the best code for this; I just want to write the loop like in the example for https://doc.rust-lang.org/nightly/std/primitive.u64.html#method.carrying_mul and have rustc+LLVM do the best thing available.

(For example, in LLVM the correct incantation for wide multiplication is to just use wider integer types, but that's the wrong incantation for wide addition. People using rust shouldn't need to know that; these should do the best code pattern for the backend in use.)

Stovent commented 2 years ago

For the signed carrying_add and borrowing_sub that have been added, I was misled by the names. I expect carrying_add-> (i, bool) to return carry as the second tuple field, just like overflowing_add -> (i, bool) returns overflow. Maybe a better name would be overflowing_add_carry/overflowing_sub_borrow, since the returned bool is an overflow flag, not a carry flag.

There is also the same problem on unsigned integers, because on the unsigned integers overflowing_add returns a carry. And the first time I used them I thought that overflowing_add on unsigned integers was returning a signed overflow flag as if it was converting the arguments to signed integers.

I like your suggestion and it can be extended to respectively rename overflowing_add and carrying_add to:

overflowing_add and overflowing_add_carry for signed integers.
carrying_add carrying_add_carry for unsigned integers.

tspiteri commented 2 years ago

There is also the same problem on unsigned integers, because on the unsigned integers overflowing_add returns a carry.

But for unsigned, overflowing_add returns true if the operation overflows, so there is no confusion. It's true that for x86 assembly unsigned overflow is the carry flag and signed overflow is the overflow flag, but for Rust overflowing_ is used for all operations, not just signed addition.

overflowing_add and overflowing_add_carry for signed integers.

carrying_add carrying_add_carry for unsigned integers.

For unsigned I'd stick with overflowing_add and carrying_add. The difference between unsigned::carrying_add and signed::overflowing_add_carry would be a good thing because if you're adding a 5-word signed, you'll have to use carrying_add 4 times and overflowing_add_carry just once. So I really don't see the point of trying to make the names of the two methods follow the same pattern. It's true that unsigned::carrying_add can be used also as the final equivalent of unsigned::overflowing_add_carry, but I don't think that renaming the carrying addition itself helps, and I don't think adding an alias overflowing_add_carry is required.

Stovent commented 2 years ago

The confusion arises because I guess I have a different meaning of overflow and carry. My definitions are the ones used by processors (https://en.wikipedia.org/wiki/Overflow_flag). So when you say

But for unsigned, overflowing_add returns true if the operation overflows, so there is no confusion.

I am confused, because overflow is only meaningful in signed contexts so unsigned operations must never overflow, they can only carry out.

So I think we first need to have proper definitions on what carry and overflow means and what are their respective behaviors. My suggestion is to keep the same definition and behavior for carry and overflow as CPUs, so having only carrying methods on unsigned and only overflowing methods on signed.

For the big int methods specifically, on the Motorola 68000 architecture the big int operations are achieved through a dedicated flag called Extend, so I'm bringing back my suggestion of naming these functions extended_{add|sub} or extending_{add|sub}. This would allow the functions to have the same name in both signed and unsigned (even though there is not real advantage to have the same name for both contexts apart from macros, where a single macro could take either an unsigned or signed input and call the same method).

scottmcm commented 2 years ago

We have types, though. So we don't have one ADD instruction with OF/CF/SF/ZF/etc flags, and thus trying to match those doesn't necessarily seem like a good fit.

"Overflow" is a totally acceptable term even for unsigned, see LLVM's uadd.with.overflow for example https://llvm.org/docs/LangRef.html#arithmetic-with-overflow-intrinsics. And while for flags I agree there's that distinction (not that I'd heard it before today), https://en.wikipedia.org/wiki/Integer_overflow also repeatedly mentions overflow for unsigned as well -- the (unsigned) odometer is described as overflowing, not carrying, for example.

I don't think it's at all worth renaming all the existing overflow_* methods.

(I did propose mul_with_carry originally, but I can accept carrying_mul just fine.)

clarfonthey commented 1 year ago

Was there a particularly big problem with the signed version of widening_mul, or did it just get accidentally removed alongside the other signed implementations? Since I was going to add it back in but wasn't sure how it should be changed.

schuelermine commented 1 year ago

What is the reason that widening_mul doesn’t have a signed version?

AaronKutch commented 1 year ago

Edit: I forgot that the signature of widening mul returns two integers, ambiguities might prevent this

This should be efficient, it works additionally for iX::MIN cases.

let lhs_neg = lhs < 0;
let rhs_neg = rhs < 0;
if lhs_neg {
    lhs = lhs.wrapping_neg();
}
if rhs_neg {
    rhs = rhs.wrapping_neg();
}
let result = (lhs as $uX).widening_mul(rhs as $uX) as $iX as $iD;
if lhs_neg != rhs_neg {
    result = result.wrapping_neg();
}

Just providing some reference code, I see no reason why a signed version could not be implemented from this.

clarfonthey commented 1 year ago

What is the reason that widening_mul doesn’t have a signed version?

None that I'm aware. The issue with i128 and u128 still stands, but that doesn't stop us from implementing for smaller integers.

clarfonthey commented 1 year ago

Additionally worth adding, since I didn't know of it at the time I created the original PR, but I feel that all these methods should definitely have codegen tests, at least for a single size of integer. They rely on optimisation actually producing the correct code, and are not as good otherwise.

tspiteri commented 1 year ago

What is the reason that widening_mul doesn’t have a signed version?

None that I'm aware.

Implementing widening_mul for signed would require a decision about the return type. While u8::widening_mul(u8) -> (u8, u8), would i8::widening_mul(i8) return (u8, i8) or (i8, i8)? Consider (-128i8).widening_mul(1i8).

Returning (u8, i8): (-128i8).widening_mul(1i8) -> (128u8, -1i8), which has the same bit representation as i16 so that combining to i16 can be done using bit shifting and logic operations without addition.
Returning (i8, i8): (-128i8).widening_mul(1i8) -> (-128i8, 0i8), which has a different bit representation from i16 so that combining to i16 has to be done using bit shifting and addition, though overflow will never occur. Also, the different bit representation means that the high word is zero even though this is a negative result.

If widening_mul is implemented for signed numbers, returning the low word as unsigned is to me the better option. Returning values that do not have the same underlying bit representation as a wider type (i16 in the example above) would be very surprising to me, ruling out the return of two signed words.

clarfonthey commented 1 year ago

For those following in on the discussion, since this was created before the ACP process was set in place, I decided to retroactively create an ACP for these methods: rust-lang/libs-team#228

Please feel free to redirect design feedback there, although since the feature is implemented already, this is probably the best place to keep any issues regarding the current implementation.

safinaskar commented 1 year ago

I'm wondering how it would be implemented on RISC-V?

We have an answer embedded in RISC-V critique. :) https://gmplib.org/list-archives/gmp-devel/2021-September/006013.html

AaronKutch commented 1 year ago

That critique is missing one of the very important things RISC-V was designed around: not having any flag state, only depends on values in registers, and only one register write out. You have to read the specification for all the reasons why they did this. They have widening multiplication through mullo and mulhi which is the most important thing. I have a concept called autounrolling that I will be working on that will fix the performance problem with simple loops like that addition loop in high performance RISC-V cores.

programmerjake commented 1 year ago

as mentioned on the ACP: I think we should have a signed/unsigned widening multiply -- it's in RISC-V and Libre-SOC is in the process of proposing that maddedus be added to PowerISA.

also, all widened signed outputs should have a signed upper half and unsigned lower half: https://github.com/rust-lang/libs-team/issues/228#issuecomment-1558448584

Lohann commented 11 months ago

Btw the borrowing_sub is not generating optimal code for aarch64 targets, example subtracting two 256bit unsigned integers:

#![feature(const_bigint_helper_methods)]
#![feature(bigint_helper_methods)]

pub const LIMBS: usize = 256 / (usize::BITS as usize);

#[no_mangle]
#[inline]
pub fn assign_sub(a: &mut [usize; LIMBS], b: &[usize; LIMBS]) {
    let mut borrow = false;
    for i in 0..LIMBS {
        (a[i], borrow) = a[i].borrowing_sub(b[i], borrow);
    }
}

This code outputs:

assign_sub:
        ldp     x8, x9, [x0]
        ldp     x10, x11, [x1]
        ldp     x12, x13, [x0, #16]
        subs    x8, x8, x10
        cset    w10, lo
        subs    x9, x9, x11
        ldp     x11, x15, [x1, #16]
        ccmp    x9, x10, #0, hs
        sub     x9, x9, x10
        cset    w14, lo
        subs    x11, x12, x11
        ccmp    x11, x14, #0, hs
        sub     x10, x11, x14
        sub     x11, x13, x15
        stp     x8, x9, [x0]
        cset    w12, lo
        sub     x11, x11, x12
        stp     x10, x11, [x0, #16]
        ret

When the optimal code actually is:

assign_sub:
        ldp     x9, x8, [x0]
        ldp     x11, x10, [x1]
        ldp     x13, x12, [x0, #16]
        ldp     x14, x15, [x1, #16]
        subs    x9, x9, x11
        sbcs    x8, x8, x10
        sbcs    x10, x13, x14
        sbc     x11, x12, x15
        stp     x9, x8, [x0]
        stp     x10, x11, [x0, #16]
        ret

kennytm commented 9 months ago

Since x.widening_mul(y) is equivalent to x.carrying_mul(y, 0), and the compiler can certainly optimize away the constant 0, what is the advantage of keeping both methods?

clarfonthey commented 9 months ago

Since x.wrapping_neg() is equivalent to 0.wrapping_sub(x) and the compiler can certainly optimize away the constant 0, what is the advantage of keeping both methods?

tarcieri commented 9 months ago

x.carrying_mul(y, 0)

In the context of an arithmetic expression, the extra 0 is visually distracting and potentially even a bit confusing for someone who is not familiar with the specific semantics of the carrying_mul function.

kennytm commented 9 months ago

@clarfonthey bruh, seriously?

because the wrapping functions aren't alone, they follow a consistent pattern with their saturating / overflowing / checked / strict_ siblings and the signed/unsigned counterparts (while widening_mul is entirely on its own).
because x is moved from the self position to the argument position in neg vs sub

@tarcieri

the extra 0 is visually distracting and potentially even a bit confusing for someone who is not familiar with the specific semantics of the carrying_mul function.

I don't see there are practical use of the widening multiplication as a BigInt primitive without carry. Consider implementing a 128-bit widening_mul using the u64 bigint helper functions (unlike https://github.com/rust-lang/rust/issues/85532#issuecomment-916309635). Notice how u64::carrying_mul() is still used everywhere, while there is only a single place u64::widening_mul() is used? There is no way someone will be using .widening_mul() but can't familiar themselves with .carrying_mul(), because sooner or later they will need to compute x*y + z.

pub fn u128_widening_mul(x: u128, y: u128) -> (u128, u128) {
    let a = (x >> 64) as u64;
    let b = x as u64;
    let c = (y >> 64) as u64;
    let d = y as u64;
    /* (10a + b) * (10c + d) = 100ac + 10(ad+bc) + bd */
    let (p1, p2) = b.widening_mul(d);
//  let (p1, p2) = b.carrying_mul(d, 0);
    let (p2, p31) = b.carrying_mul(c, p2);
    let (p2, p32) = a.carrying_mul(d, p2);
    let (p3, p4o) = p31.overflowing_add(p32);
    let (p3, p4) = a.carrying_mul(c, p3);
    let p4 = p4.wrapping_add(p4o.into());

    (u128::from(p1) | u128::from(p2) << 64, u128::from(p3) | u128::from(p4) << 64)
}

tarcieri commented 9 months ago

I don't see there are practical use of the widening multiplication as a BigInt primitive without carry

The first one that comes to mind for me is Montgomery modular multiplication, which performs a widening multiplication then reduces the double-width result.

programmerjake commented 9 months ago

another common use of widening_mul is to compute a good approximation of multiplying by a rational number, though this requires a narrowing_div which we don't have. u64 equivalent using u128 arithmetic: ((value as u128 * numerator as u128) / denominator as u128) as u64

this is like the Win32 MulDiv function

clarfonthey commented 9 months ago

@clarfonthey bruh, seriously?

1. because the wrapping_ functions aren't alone, they follow a consistent pattern with their saturating_ / overflowing_ / checked_ / strict_ siblings and the signed/unsigned counterparts (while `widening_mul` is entirely on its own).

2. because `x` is moved from the `self` position to the argument position in `neg` vs `sub`

I'll admit that my comment was a bit rude despite the intent, and I should have waited until I was at a proper keyboard to write a full response instead of cheekily copying your argument from my phone.

A few folks have added additional algorithms that benefit from this already, but I think that it's worth having these methods regardless because the discussion on what to name these operations is important, and because even if something can be optimised properly, it's useful to have a method that does the right thing without optimisations anyway.

rust-lang / rust