Open clarfonthey opened 3 years ago
Are there other methods that should be added in addition to the existing ones?
I'd like a mul_mod, as shown in #85017, because I think you can't implement it efficiently without asm and it's a basic block for power_mod and other things.
Another set of methods that could be useful that I'll probably offer implementations for at some point:
/// `(self << rhs) | carry`
fn carrying_shl(self, rhs: u32, carry: Self) -> (Self, Self);
/// `(self >> rhs) | carry`
fn borrowing_shr(self, rhs: u32, carry: Self) -> (Self, Self);
/// `self << rhs`
fn widening_shl(self, rhs: u32) -> (Self, Self);
/// `self >> rhs`
fn widening_shr(self, rhs: u32) -> (Self, Self);
Essentially, return the two halves of a rotation, i.e. x.widening_shl(y)
is the same as (x << y, x >> (BITS - y))
and similarly for widening_shr
. Not sure whether they should allow rhs == BITS
or not, but presumably they wouldn't for consistency with existing shift methods.
From @scottmcm in the original PR:
Some prior art I happened upon: https://docs.rs/cranelift-codegen/0.74.0/cranelift_codegen/ir/trait.InstBuilder.html#method.isub_bin
Same as isub with an additional borrow flag input. Computes:
a = x - (y + b_{in}) \pmod 2^B
Why don't we add carrying_mul
and widening_mul
for i128/u128
as well?
Mostly effort implementing them efficiently. In the meantime, you can do it with four calls to the u64
version. Or three if you want to be fancy.
fn borrowing_sub(self, rhs: Self, carry: bool) -> (Self, bool);
I was very confused by this function name at first, since borrowing in Rust usually refers to references. I am not a native speaker, but I do formal mathematical work in English professionally, and yet I never before heard the term "borrowing" in the context of subtraction. So I think this, at least, needs some explanation in the docs. (I would have expected something like carrying_sub
, but maybe that is nonsense for a native speaker.)
The current docs for some of the other methods could probably also be improved: they talk about not having the "ability to overflow", which makes it sound like not overflowing is a bad thing.
The word borrow here comes from the terminology for a full subtractor. I am thinking that maybe the borrowing_sub function could be removed altogether. The same effect that borrowing_sub has can be obtained from carrying_add by making the first carrying_add in the chain have a set carry bit, and then bitnot every rhs. This fact could be put in the documentation of carrying_add.
The word borrow here comes from the terminology for a full subtractor. I am thinking that maybe the borrowing_sub function could be removed altogether. The same effect that borrowing_sub has can be obtained from carrying_add by making the first carrying_add in the chain have a set carry bit, and then bitnot every rhs. This fact could be put in the documentation of carrying_add.
Considering how the primary goal of these methods is to be as efficient as possible, usually optimising down to a single instruction, I don't think it'd be reasonable to just get rid of subtraction in favour of telling everyone to use addition instead. Definitely open to changing the name, though.
These helper methods will not be very useful to me unless they are implemented for every kind of integer. Here is an implementation for a widening multiplication-addition for u128:
/// Extended multiply-addition of `(lhs * rhs) + add`. The result is returned as a tuple of the wrapping part and the
/// overflow part. No numerical overflow is possible even if all three arguments are set to their max values.
pub const fn widen_mul_add(lhs: u128, rhs: u128, add: u128) -> (u128, u128) {
// [rhs_hi] [rhs_lo]
// [lhs_hi] [lhs_lo]
// X___________________
// [------tmp0------]
// [------tmp1------]
// [------tmp2------]
// [------tmp3------]
// [-------add------]
// +_______________________________________
// [------sum0------]
// [------sum1------]
let lhs_lo = lhs as u64;
let rhs_lo = rhs as u64;
let lhs_hi = (lhs.wrapping_shr(64)) as u64;
let rhs_hi = (rhs.wrapping_shr(64)) as u64;
let tmp0 = (lhs_lo as u128).wrapping_mul(rhs_lo as u128);
let tmp1 = (lhs_lo as u128).wrapping_mul(rhs_hi as u128);
let tmp2 = (lhs_hi as u128).wrapping_mul(rhs_lo as u128);
let tmp3 = (lhs_hi as u128).wrapping_mul(rhs_hi as u128);
// tmp1 and tmp2 straddle the boundary. We have to handle three carries
let (sum0, carry0) = tmp0.overflowing_add(tmp1.wrapping_shl(64));
let (sum0, carry1) = sum0.overflowing_add(tmp2.wrapping_shl(64));
let (sum0, carry2) = sum0.overflowing_add(add as u128);
let sum1 = tmp3
.wrapping_add(tmp1.wrapping_shr(64))
.wrapping_add(tmp2.wrapping_shr(64))
.wrapping_add(carry0 as u128)
.wrapping_add(carry1 as u128)
.wrapping_add(carry2 as u128);
(sum0, sum1)
}
I have tested this with my crate awint
.
edit: There is a version of this that uses the Karatsuba trick to use 3 multiplications instead of 4, but it incurs extra summations, branches, and is not as parallel. For typical desktop processors the above should be the fastest.
I would make a PR for that.
Some alternative signatures include u128::widen_mul_add(lhs, rhs, add)
, lhs.widen_mul_add(rhs, add)
, or add.widen_mul_add(lhs, rhs)
. In awint
my general purpose mul-add function is mul_add_triop
which uses the third signature but takes self
mutably and add-assigns lhs * rhs
. I'm not sure which is best.
I would also change up the documentation headers for the carrying_add
function to say
Extended addition of `self + rhs + carry`. The booleans are interpreted as a single bit
integer of value 0 or 1. If unsigned overflow occurs, then the boolean in the tuple
returns 1. The output carry can be chained into the input carry of another carrying add,
which allows for arbitrarily large additions to be calculated.
I specifically note unsigned overflow
, because that happens for both signed and unsigned
integers because of how two's complement works.
borrowing_sub
should be left in with its naming, but its documentation could be
Extended subtraction of `self - rhs - borrow`. The "borrowing" here refers to borrowing in the full subtractor sense.
The booleans are interpreted as a single bit integer of value 0 or 1. If unsigned overflow occurs, then the boolean
in the tuple returns 1. The output carry can be chained into the input carry of another borrowing subtract,
which allows for arbitrarily large subtraction to be calculated.
I specifically note
unsigned overflow
, because that happens for both signed and unsigned integers because of how two's complement works.
But unsigned overflow and signed overflow are different. For example, on x86_64, while unsigned and signed integers share addition and subtraction instructions, unsigned overflow is detected using the carry flag while signed overflow is detected using the overflow flag.
As a concrete example: 127i8 + 1
causes signed overflow but not unsigned overflow. So the carry flag should be false/0.
Edit: I think I had misread your comment and thought the middle part of your comment was the current doc, not your suggestion, so it looks like I completely misinterpreted your final comment.
Yes signed and unsigned overflow are different, but the carrying_add
as implemented for unsigned and signed integers both use unsigned overflow because of how two's complement carrying works. Someone using i64::carrying_add
might think that the carry out bit follows the bit of i64::overflowing_add
when in actuality it is following u64::overflowing_add
. So in the documentation I would put emphasis on _unsigned_ overflow
.
I think all of these are good suggestions, and like mentioned earlier, these changes definitely should go in a PR if you have the time. I think one important thing to note is that so far the APIs here seem good, but the documentation definitely could use some work. Although if there's a bigger case for changing the subtraction behaviour to be more in line with what's expected (the existing behaviour is mostly modelled after the x86 instructions adc and sbb), then I'm for that.
That said, the main goal is to make it relatively painless to write correct code that compiles down to the right instructions in release mode, so, I would say we should make sure that happens regardless of what's done. I would have added an explicit test for that but I honestly don't know how.
I've been using these functions along with the overflowing_add
/overflowing_sub
functions to implement a virtual machine (for the Motorola 68000 architecture). The overflowing functions are great for this kind of application, but I have a few comments to add on carrying_add
and borrowing_sub
so that they can fit my needs.
First, I agree that it would be better if carrying_add and borrowing_sub had a more similar name. Based on my experience with the M68000 arch that has a dedicated extend bit for this application, I would suggest the names extended_add
and extended_sub
or extending_add
and extending_sub
.
Also based on my experience with implementing M68000 instructions, I think it would be better if carrying_add
and borrowing_sub
had the same overflowing behaviour as their overflowing
equivalents. This would also be useful to correctly detect signed overflow in big int contexts.
For instance, here is a possible i16 implementation using only 8-bits values with carrying_add, where signed carrying_add
overflow has to be detected (the principle applies to any big int type) :
#![feature(bigint_helper_methods)]
struct I16 {
/// Low-order bytes has to be unsigned.
pub low: u8,
/// Most Significant Byte has to be of the same signedness as the desired type.
/// So u8 if I implement a U16, i8 if I implement I16.
pub high: i8,
}
impl I16 {
/// Adds `rhs` to `self` and returns true if signed overflow occurs, false otherwise.
pub fn overflowing_add(&mut self, rhs: Self) -> bool {
// For the low-order bytes, unsigned overflow has to be used.
let (low_res, low_carry) = self.low.carrying_add(rhs.low, false);
// To correctly detect signed overflow, carrying_add should also detect signed-overflow with signed data,
// Which is currently not the case. This returns (0x80, false) whereas a signed overflow occured.
let (high_res, high_carry) = self.high.carrying_add(rhs.high, low_carry);
self.low = low_res;
self.high = high_res;
high_carry
}
}
fn main() {
let (desired_res, desired_overflow) = 0x7FFFi16.overflowing_add(1); // returns (i16::MIN, true)
let mut my_i16 = I16 {
low: 0xFF,
high: 0x7F,
};
let one = I16 {
low: 1,
high: 0,
};
let my_i16_overflow = my_i16.overflowing_add(one);
let res = (my_i16.high as i16) << 8 | my_i16.low as i16;
assert_eq!(desired_res, res); // Works
assert_eq!(desired_overflow, my_i16_overflow); // Fails with left == true and right == false
}
Obviously, this example should also applies to borrowing_sub
.
It is also noted as a bug in a recent issue : #90541
I'd like to further explain what the functions behaviors should be according to my experience, and to provide another use case.
First, there is the question of intermediate overflow that may occur in some operations. For instance, (-128i8).carrying_add(-1, true)
has an intermediate overflow, because the first computation does -128 + (-1) = -129
, which overflows to 127, but then adding the carry bit gives 127 + 1
that which overflows back to -128, or -129 + 1 = -128
. In this case there are intermediate overflows, and my experience with M68000 makes me think they should not be returned by the functions, so (-128i8).carrying_add(-1, true)
should return (-128, false)
. The reasoning behind that is because signed overflow occurs when the result of a computation cannot be represented inside it's size. Here -128 perfectly fit inside an 8-bits number so the result did not overflowed.
These functions are also convenient to emulate big integers in virtual machines. Let's take an example with the ADDX instruction of the M68000, which is designed to implement big integers on the CPU (the example simplifies the architecture details to only keep what is important in our case): The instruction is defined as
Description: Adds the source operand and the extend bit to the destination operand and stores the result in the destination location.
Condition Codes:
- eXtend: Set the same as the carry bit.
- oVerflow: Set if an overflow occurs; cleared otherwise.
- Carry: Set if a carry is generated; cleared otherwise.
- [...]
The operation adds the destination operand to the source operand, and then add the eXtend bit from the condition codes to the previous result. The condition codes are individual bits used to indicate what happened during the last operation. The eXtend bit is what allows the CPU to implement big integers. oVerflow is set when a signed overflow occurs. Carry is set when an unsigned overflow occurs.
Using the big int helper functions makes the implementation of this instruction easy and less error-prone because I don't have to calculate the condition codes myself, the function does it for me. A typical implementation would look like that :
// Here `memory.get_byte(some_address)` is used to abstract the architecture details.
let src: u8 = memory.get_byte(some_address);
let dst: u8 = memory.get_byte(some_address);
// Here is where the signed functions are useful, they computes the (signed) overflow bit.
// `codes` is the variable containing the conditions codes described upper.
let (result, v) = (src as i8).carrying_add(dst, codes.x);
// Use the unsigned big int function to compute the carry (unsigned overflow) bit.
let (_, c) = src.carrying_add(dst, codes.x);
// Now computes the condition codes using the result value and the
codes.x = c;
codes.v = v;
codes.c = c;
// Other codes [...]
// Write back the result
memory.set_byte(some_address, result);
This post provides a concrete example on why intermediate overflow should not occur. It reuses the code in my first post here.
TL;DR; doing -32513 + -255
using my implementation example would give the correct answer -32768
, but having intermediate would indicate that the operation overflowed while it doesn't.
#![feature(bigint_helper_methods)]
struct I16 {
/// Low-order bytes has to be unsigned.
pub low: u8,
/// Most Significant Byte has to be of the same signedness as the desired type.
/// So u8 if I implement a U16, i8 if I implement I16.
pub high: i8,
}
impl I16 {
/// Adds `rhs` to `self` and returns true if signed overflow occurs, false otherwise.
pub fn overflowing_add(&mut self, rhs: Self) -> bool {
let (low_res, low_carry) = self.low.carrying_add(rhs.low, false); // Returns (0, true)
// In the example from the main function, the operation below performes -128 + -1 + 1.
// Intermediate overflow occurs as -128 + -1 gives -129, which overflow to 127. Then 127 + 1 gives 128, which overflows back to -128.
// If carrying_add on signed numbers returns true on intermediate overflow, then this function returns true in the example.
// However, from a i16 point of view, -32_513 + -255 gives -32768, which does not overflow. So intermediate overflow must not be returned.
// Using overflowing_add as signed bit ints are not implemented anymore.
let (high_res, high_carry) = self.high.overflowing_add(rhs.high + low_carry as i8); // Returns (-128, false), which is correct.
self.low = low_res;
self.high = high_res;
high_carry
}
}
fn main() {
let (desired_res, desired_overflow) = (-32_513i16).overflowing_add(-255); // Returns (i16::MIN, false)
let mut my_i16 = I16 { // -32_513
low: 0xFF,
high: -128,
};
let one = I16 { // -255
low: 1,
high: -1,
};
let my_i16_overflow = my_i16.overflowing_add(one); // If intermediate overflow was returned, the assert below would fail.
let res = (my_i16.high as i16) << 8 | my_i16.low as i16;
assert_eq!(desired_res, res); // Works
assert_eq!(desired_overflow, my_i16_overflow);
}
To resume my ideas:
I can provide implementations of the add/sub functions with the behaviours I described if necessary.
Also pinging the author of the issue @clarfonthey for visibility (sorry if it is inappropriate).
I also support the widening_shl/r
functions described in https://github.com/rust-lang/rust/issues/85532#issuecomment-846279618 , as they are useful to implement rotate_left/_right
on big ints. Maybe they can be added to the public API ?
I don't know how changes like these are accepted to be implemented in the std and implementations can be provided. Should we do a complete RFC resuming all those functions ?
Since we've gotten the go-ahead to start adding these, IMHO, starting with code is probably the best to go anyway. If an RFC is needed that can be done after these functions are usable on nightly, since that'll be when more people are using them and we have a more solid idea of how useful they are.
@Stovent Small method additions, especially ones that follow existing patterns, you can generally just send a PR. As a rough heuristic, if it's easier to write the PR than an RFC, then do that -- the exception being things with broad impact, like traits or generally anything that would change conventions for how things are done in the ecosystem.
But if you have any questions, you can always ask on zulip.
It's very useful to add a mul_mod
function to the integer types (the name can be better designed tho). Here's the assembly comparison with a naive implementation in Godbolt%3B%0A++++r%0A%7D%0A%0Apub+unsafe+fn+mul_mod2(this:+u64,+b:+u64,+m:+u64)+-%3E+u64+%7B%0A++++//+there+is+no+unchecked_rem+yet%0A++++((this+as+u128).unchecked_mul(b+as+u128)+%25+m+as+u128)+as+u64%0A%7D%0A'),l:'5',n:'0',o:'Rust+source+%231',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((g:!((h:compiler,i:(compiler:nightly,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:rust,libs:!(),options:'',selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1,tree:'1'),l:'5',n:'0',o:'rustc+nightly+(Rust,+Editor+%231,+Compiler+%231)',t:'0')),k:50,l:'4',m:68.83561643835617,n:'0',o:'',s:0,t:'0'),(g:!((h:output,i:(compiler:1,editor:1,fontScale:14,fontUsePx:'0',tree:'1',wrap:'1'),l:'5',n:'0',o:'Output+of+rustc+nightly+(Compiler+%231)',t:'0')),header:(),l:'4',m:31.164383561643838,n:'0',o:'',s:0,t:'0')),k:50,l:'3',n:'0',o:'',t:'0')),l:'2',n:'0',o:'',t:'0')),version:4)
And it's also very helpful for me if widening_mul
or carrying_mul
is implemented for u128
, which otherwise can be implemented inefficiently like here. It will be nice if we have the support for the largest primitive integer.
FWIW I just found a bug in the following code:
pub fn overflowing_add(mut self, other: Uint) -> (Uint, bool) {
let mut carry = false;
let mut carry_out;
for i in 0..Self::LIMBS {
(self.0[i], carry_out) = self.0[i].overflowing_add(other.0[i]);
self.0[i] += carry as u64;
carry = carry_out;
}
(self, carry)
}
Fixed it (maybe even could've been prevented) by using these provided function:
pub fn overflowing_add(mut self, other: Uint) -> (Uint, bool) {
let mut carry = false;
let mut carry_out;
for i in 0..Self::LIMBS {
(self.0[i], carry_out) = self.0[i].carrying_add(other.0[i], carry);
carry = carry_out;
}
(self, carry)
}
For the signed carrying_add
and borrowing_sub
that have been added, I was misled by the names. I expect carrying_add-> (i, bool)
to return carry as the second tuple field, just like overflowing_add -> (i, bool)
returns overflow. Maybe a better name would be overflowing_add_carry
/overflowing_sub_borrow
, since the returned bool is an overflow flag, not a carry flag.
I'm wondering how it would be implemented on RISC-V? AFAIK this architecture don't have carry flag :-(
RISC-V’s add
/stlu
provide carryout:
https://godbolt.org/z/Y7f5dzj1P
However they have data-dependencies on each other, which means no ILP, unfortunately
@Mart-Bogdan I think that's in a way a good example of why this is a good thing to have. I don't want to have to know the magic incantation that produces the best code for this; I just want to write the loop like in the example for https://doc.rust-lang.org/nightly/std/primitive.u64.html#method.carrying_mul and have rustc+LLVM do the best thing available.
(For example, in LLVM the correct incantation for wide multiplication is to just use wider integer types, but that's the wrong incantation for wide addition. People using rust shouldn't need to know that; these should do the best code pattern for the backend in use.)
For the signed
carrying_add
andborrowing_sub
that have been added, I was misled by the names. I expectcarrying_add-> (i, bool)
to return carry as the second tuple field, just likeoverflowing_add -> (i, bool)
returns overflow. Maybe a better name would beoverflowing_add_carry
/overflowing_sub_borrow
, since the returned bool is an overflow flag, not a carry flag.
There is also the same problem on unsigned integers, because on the unsigned integers overflowing_add
returns a carry.
And the first time I used them I thought that overflowing_add
on unsigned integers was returning a signed overflow flag as if it was converting the arguments to signed integers.
I like your suggestion and it can be extended to respectively rename overflowing_add
and carrying_add
to:
overflowing_add
and overflowing_add_carry
for signed integers.carrying_add
carrying_add_carry
for unsigned integers.There is also the same problem on unsigned integers, because on the unsigned integers
overflowing_add
returns a carry.
But for unsigned, overflowing_add
returns true if the operation overflows, so there is no confusion. It's true that for x86 assembly unsigned overflow is the carry flag and signed overflow is the overflow flag, but for Rust overflowing_
is used for all operations, not just signed addition.
overflowing_add
andoverflowing_add_carry
for signed integers.
carrying_add
carrying_add_carry
for unsigned integers.
For unsigned I'd stick with overflowing_add
and carrying_add
. The difference between unsigned::carrying_add
and signed::overflowing_add_carry
would be a good thing because if you're adding a 5-word signed, you'll have to use carrying_add
4 times and overflowing_add_carry
just once. So I really don't see the point of trying to make the names of the two methods follow the same pattern. It's true that unsigned::carrying_add
can be used also as the final equivalent of unsigned::overflowing_add_carry
, but I don't think that renaming the carrying addition itself helps, and I don't think adding an alias overflowing_add_carry
is required.
The confusion arises because I guess I have a different meaning of overflow and carry. My definitions are the ones used by processors (https://en.wikipedia.org/wiki/Overflow_flag). So when you say
But for unsigned, overflowing_add returns true if the operation overflows, so there is no confusion.
I am confused, because overflow is only meaningful in signed contexts so unsigned operations must never overflow, they can only carry out.
So I think we first need to have proper definitions on what carry and overflow means and what are their respective behaviors. My suggestion is to keep the same definition and behavior for carry and overflow as CPUs, so having only carrying methods on unsigned and only overflowing methods on signed.
For the big int methods specifically, on the Motorola 68000 architecture the big int operations are achieved through a dedicated flag called Extend, so I'm bringing back my suggestion of naming these functions extended_{add|sub}
or extending_{add|sub}
. This would allow the functions to have the same name in both signed and unsigned (even though there is not real advantage to have the same name for both contexts apart from macros, where a single macro could take either an unsigned or signed input and call the same method).
We have types, though. So we don't have one ADD instruction with OF/CF/SF/ZF/etc flags, and thus trying to match those doesn't necessarily seem like a good fit.
"Overflow" is a totally acceptable term even for unsigned, see LLVM's uadd.with.overflow
for example https://llvm.org/docs/LangRef.html#arithmetic-with-overflow-intrinsics. And while for flags I agree there's that distinction (not that I'd heard it before today), https://en.wikipedia.org/wiki/Integer_overflow also repeatedly mentions overflow for unsigned as well -- the (unsigned) odometer is described as overflowing, not carrying, for example.
I don't think it's at all worth renaming all the existing overflow_*
methods.
(I did propose mul_with_carry
originally, but I can accept carrying_mul
just fine.)
Was there a particularly big problem with the signed version of widening_mul, or did it just get accidentally removed alongside the other signed implementations? Since I was going to add it back in but wasn't sure how it should be changed.
What is the reason that widening_mul
doesn’t have a signed version?
Edit: I forgot that the signature of widening mul returns two integers, ambiguities might prevent this
This should be efficient, it works additionally for iX::MIN
cases.
let lhs_neg = lhs < 0;
let rhs_neg = rhs < 0;
if lhs_neg {
lhs = lhs.wrapping_neg();
}
if rhs_neg {
rhs = rhs.wrapping_neg();
}
let result = (lhs as $uX).widening_mul(rhs as $uX) as $iX as $iD;
if lhs_neg != rhs_neg {
result = result.wrapping_neg();
}
Just providing some reference code, I see no reason why a signed version could not be implemented from this.
What is the reason that
widening_mul
doesn’t have a signed version?
None that I'm aware. The issue with i128 and u128 still stands, but that doesn't stop us from implementing for smaller integers.
Additionally worth adding, since I didn't know of it at the time I created the original PR, but I feel that all these methods should definitely have codegen tests, at least for a single size of integer. They rely on optimisation actually producing the correct code, and are not as good otherwise.
What is the reason that
widening_mul
doesn’t have a signed version?None that I'm aware.
Implementing widening_mul
for signed would require a decision about the return type. While u8::widening_mul(u8) -> (u8, u8)
, would i8::widening_mul(i8)
return (u8, i8)
or (i8, i8)
? Consider (-128i8).widening_mul(1i8)
.
(u8, i8)
: (-128i8).widening_mul(1i8) -> (128u8, -1i8)
, which has the same bit representation as i16
so that combining to i16
can be done using bit shifting and logic operations without addition.(i8, i8)
: (-128i8).widening_mul(1i8) -> (-128i8, 0i8)
, which has a different bit representation from i16
so that combining to i16
has to be done using bit shifting and addition, though overflow will never occur. Also, the different bit representation means that the high word is zero even though this is a negative result.If widening_mul
is implemented for signed numbers, returning the low word as unsigned is to me the better option. Returning values that do not have the same underlying bit representation as a wider type (i16
in the example above) would be very surprising to me, ruling out the return of two signed words.
For those following in on the discussion, since this was created before the ACP process was set in place, I decided to retroactively create an ACP for these methods: rust-lang/libs-team#228
Please feel free to redirect design feedback there, although since the feature is implemented already, this is probably the best place to keep any issues regarding the current implementation.
I'm wondering how it would be implemented on RISC-V?
We have an answer embedded in RISC-V critique. :) https://gmplib.org/list-archives/gmp-devel/2021-September/006013.html
That critique is missing one of the very important things RISC-V was designed around: not having any flag state, only depends on values in registers, and only one register write out. You have to read the specification for all the reasons why they did this. They have widening multiplication through mullo and mulhi which is the most important thing. I have a concept called autounrolling that I will be working on that will fix the performance problem with simple loops like that addition loop in high performance RISC-V cores.
as mentioned on the ACP: I think we should have a signed/unsigned widening multiply -- it's in RISC-V and Libre-SOC is in the process of proposing that maddedus
be added to PowerISA.
also, all widened signed outputs should have a signed upper half and unsigned lower half: https://github.com/rust-lang/libs-team/issues/228#issuecomment-1558448584
Btw the borrowing_sub
is not generating optimal code for aarch64 targets, example subtracting two 256bit unsigned integers:
#![feature(const_bigint_helper_methods)]
#![feature(bigint_helper_methods)]
pub const LIMBS: usize = 256 / (usize::BITS as usize);
#[no_mangle]
#[inline]
pub fn assign_sub(a: &mut [usize; LIMBS], b: &[usize; LIMBS]) {
let mut borrow = false;
for i in 0..LIMBS {
(a[i], borrow) = a[i].borrowing_sub(b[i], borrow);
}
}
This code outputs:
assign_sub:
ldp x8, x9, [x0]
ldp x10, x11, [x1]
ldp x12, x13, [x0, #16]
subs x8, x8, x10
cset w10, lo
subs x9, x9, x11
ldp x11, x15, [x1, #16]
ccmp x9, x10, #0, hs
sub x9, x9, x10
cset w14, lo
subs x11, x12, x11
ccmp x11, x14, #0, hs
sub x10, x11, x14
sub x11, x13, x15
stp x8, x9, [x0]
cset w12, lo
sub x11, x11, x12
stp x10, x11, [x0, #16]
ret
When the optimal code actually is:
assign_sub:
ldp x9, x8, [x0]
ldp x11, x10, [x1]
ldp x13, x12, [x0, #16]
ldp x14, x15, [x1, #16]
subs x9, x9, x11
sbcs x8, x8, x10
sbcs x10, x13, x14
sbc x11, x12, x15
stp x9, x8, [x0]
stp x10, x11, [x0, #16]
ret
Since x.widening_mul(y)
is equivalent to x.carrying_mul(y, 0)
, and the compiler can certainly optimize away the constant 0, what is the advantage of keeping both methods?
Since x.wrapping_neg()
is equivalent to 0.wrapping_sub(x)
and the compiler can certainly optimize away the constant 0, what is the advantage of keeping both methods?
x.carrying_mul(y, 0)
In the context of an arithmetic expression, the extra 0
is visually distracting and potentially even a bit confusing for someone who is not familiar with the specific semantics of the carrying_mul
function.
@clarfonthey bruh, seriously?
widening_mul
is entirely on its own).x
is moved from the self
position to the argument position in neg
vs sub
@tarcieri
the extra
0
is visually distracting and potentially even a bit confusing for someone who is not familiar with the specific semantics of thecarrying_mul
function.
I don't see there are practical use of the widening multiplication as a BigInt primitive without carry. Consider implementing a 128-bit widening_mul
using the u64
bigint helper functions (unlike https://github.com/rust-lang/rust/issues/85532#issuecomment-916309635). Notice how u64::carrying_mul()
is still used everywhere, while there is only a single place u64::widening_mul()
is used? There is no way someone will be using .widening_mul()
but can't familiar themselves with .carrying_mul()
, because sooner or later they will need to compute x*y + z
.
pub fn u128_widening_mul(x: u128, y: u128) -> (u128, u128) {
let a = (x >> 64) as u64;
let b = x as u64;
let c = (y >> 64) as u64;
let d = y as u64;
/* (10a + b) * (10c + d) = 100ac + 10(ad+bc) + bd */
let (p1, p2) = b.widening_mul(d);
// let (p1, p2) = b.carrying_mul(d, 0);
let (p2, p31) = b.carrying_mul(c, p2);
let (p2, p32) = a.carrying_mul(d, p2);
let (p3, p4o) = p31.overflowing_add(p32);
let (p3, p4) = a.carrying_mul(c, p3);
let p4 = p4.wrapping_add(p4o.into());
(u128::from(p1) | u128::from(p2) << 64, u128::from(p3) | u128::from(p4) << 64)
}
I don't see there are practical use of the widening multiplication as a BigInt primitive without carry
The first one that comes to mind for me is Montgomery modular multiplication, which performs a widening multiplication then reduces the double-width result.
another common use of widening_mul
is to compute a good approximation of multiplying by a rational number, though this requires a narrowing_div
which we don't have. u64
equivalent using u128
arithmetic:
((value as u128 * numerator as u128) / denominator as u128) as u64
this is like the Win32 MulDiv
function
@clarfonthey bruh, seriously?
1. because the wrapping_ functions aren't alone, they follow a consistent pattern with their saturating_ / overflowing_ / checked_ / strict_ siblings and the signed/unsigned counterparts (while `widening_mul` is entirely on its own). 2. because `x` is moved from the `self` position to the argument position in `neg` vs `sub`
I'll admit that my comment was a bit rude despite the intent, and I should have waited until I was at a proper keyboard to write a full response instead of cheekily copying your argument from my phone.
A few folks have added additional algorithms that benefit from this already, but I think that it's worth having these methods regardless because the discussion on what to name these operations is important, and because even if something can be optimised properly, it's useful to have a method that does the right thing without optimisations anyway.
Feature gate:
#![feature(bigint_helper_methods)]
This is a tracking issue for the following methods on integers:
carrying_add
borrowing_sub
carrying_mul
widening_mul
These methods are intended to help centralise the effort required for creating efficient big integer implementations, by offering a few methods which would otherwise require special compiler intrinsics or custom assembly code in order to do efficiently. They do not alone constitute big integer implementations themselves, but are necessary building blocks for a larger implementation.
Public API
Steps / History
widening_mul
RFC: rust-lang/rfcs#2417carrying_add
,borrowing_sub
,carrying_mul
, andwidening_mul
#85017carrying_add
andborrowing_sub
on signed types: https://github.com/rust-lang/rust/pull/93873Unresolved Questions
widening_mul
that simply returns the next-larger type? What would we do foru128
/i128
?