Suggestions for additional floating-point types

aaronfranke commented 5 years ago

I noticed that, like other languages, the only floating-point types built-in are f32 and f64. However, only having these can be limiting. I propose adding f128, and as mentioned in this thread f16 would likely be very useful for some workloads.

f128 would not be needed in most programs, but there are use cases for them, and it'd be nice to have it as a language built-in type. RISC-V is able to hardware accelerate them using the Q extension.

f16 is a more efficient type for workloads where you need tons of floats at low precision, like machine learning. Hardware using this is already widespread in Apple's neural engine and in mobile graphics.

Also, if covering IEEE-754 is desired, then there's also f256.

Original text:

I noticed that, like other languages, the only floating-point types built-in are `f32` and `f64`. However, I often have limitations with just these. I propose the following: ~~`fsize`, `freal`~~, and `f128` ~~`fsize` would be like `isize` but for floats. Basically, use the version that's most efficient for your processor. On modern 64-bit processors with wide FPUs and/or 256-bit SIMD this would become `f64`.~~ ~~Sometimes I want to be able to have a variable for real numbers, or I don't know what precision I want yet. In C++ I can do the following to have an abstract precision that I control via compiler flags:~~ ~~`#ifdef REAL_T_IS_DOUBLE`~~ ~~`typedef double real_t;`~~ ~~`#else`~~ ~~`typedef float real_t;`~~ ~~`#endif`~~ ~~I propose something similar in Rust, where you can just write `freal` or something and be able to change the precision later with compiler flags. The default would probably be `f32`.~~ Finally, it would be nice to have 128-bit floats (`f128`) in the language. These are not normally needed, but there are use cases for them, and it'd be nice to have it as a language built-in type. Some newer processors have 512-bit SIMD chipsets that can process these efficiently, though most don't. If you only implement some of these proposals, that's fine too. Originally posted at https://github.com/rust-lang/rust/issues/57928

sfackler commented 5 years ago

fsize would be like isize but for floats. Basically, use the version that's most efficient for your processor.

isize is not the integer type that's most efficient for your processors - it's the integer type that's the same size as a pointer. It's like ptrdiff_t, not int.

I propose something similar in Rust, where you can just write freal or something and be able to change the precision later with compiler flags. The default would probably be f32.

#[cfg(feature = "real_t_is_double")]
type real_t = f64;
#[cfg(not(feature = "real_t_is_double")]
type real_t = f32;

moonheart08 commented 5 years ago

A better suggestion would be f16 support, as it is common in graphics.

shingtaklam1324 commented 5 years ago

@moonheart08

Are f16 used much in intermediate calculations? I know it is used commonly as a storage format, but that last time I checked this (I wrote a Pre-RFC on this on internals a while back, but I'm a bit fuzzy on the details), a lot of the calculations involving f16 on most platforms is done by casting to f32, performing the op, the cast back to f16. If that is the case then having native f16 support may not be that important.

Adding the ability to use the F16C instructions may be useful to have in core::arch though, perhaps something like __m128h which has 8 "f16"s.

Coder-256 commented 5 years ago

How about long double and 128-bit floats? ~I could be wrong, but I'm 99% sure that we currently unavoidably lose precision when using long doubles from C. On my computer (macOS), bindgen outputs f64, but sizeof(long double) in C outputs 16 bytes. (128 bits; for alignment I guess?).~

~(On a side note, is that even safe behavior? What about C functions that take long double *?)~

aaronfranke commented 5 years ago

@Coder-256 In C++, long double is 64-bit on Windows, 80-bit in MinGW, and 128-bit on Mac and Linux (probably indeed for alignment, as I don't think anyone implements it as quadruple precision).

Coder-256 commented 5 years ago

@aaronfranke Could you please clarify what you mean? What I was trying to say is that Rust currently does not have any support for floats larger than 64 bits (8 bytes), for example, long double on certain platforms. I was also trying to point out that in addition to having limited precision within Rust code, this makes it difficult to interact with native code that uses large floats, such as using FFI with C code that uses floats larger than 64 bits.

There was also a separate issue with bindgen that caused float sizes to be incorrect for large floats, but that has been fixed (in rust-lang/rust-bindgen@ed6e1bbec439e8b260e6e701379fc70d295f35fe).

aaronfranke commented 5 years ago

I wasn't disagreeing with you, I was just adding information. Sorry if I wasn't clear. f128 would be great.

Coder-256 commented 5 years ago

@aaronfranke I absolutely agree, both f128 and f80 would be very useful, especially for FFI (for example, Swift already has Float80 mainly for communicating with old C code, just an example to show how it could help)

lygstate commented 4 years ago

old things never be gone, I wanna push this. rust is a system language not a script language, need compat old things.

lygstate commented 4 years ago

I wanna push add support for fp80 and fp128... any help need?

lygstate commented 4 years ago

Like https://github.com/rust-lang/rust/pull/38482 does

thomcc commented 4 years ago

Basically, use the version that's most efficient for your processor. On modern 64-bit processors with wide FPUs and/or 256-bit SIMD this would become f64.

Even on modern x86 which has similar or equal speed between most f32 and f64 ops, f32 is still very much the fastest for your processor because it cuts cache misses in half.

Sometimes I want to be able to have a variable for real numbers, or I don't know what precision I want yet. In C++ I can do the following to have an abstract precision that I control via compiler flags:

#[cfg(real_is_f64)]
type real = f64;
#[cfg(not(real_is_f64))]
type real = f32;

then you can control via RUSTFLAGS="--cfg real_is_f64" (you can also use cargo features, but they're not a great fit for cases where enabling a feature can cause compile errors like this)

... Regarding suggestions of f80

What would f80 do on platforms that aren't x86? Noting else has native 80bit floats. It's not even part of IEEE754 (even though it's largely natural extension of it... although it has a lot of quirks). This is something that would be viable in core::arch::{x86,x86_64} but isn't portable. We don't want to have to implement these as software floats on other platforms.

I'd be in favor of a std::os::raw::c_long_double type but it would have to be carefully designed. Note that PPC's long double is exceptionally cursed, as it's a pair of doubles that are summed together...

I'd be in favor of f16, and tentatively f128 since binary128 is part of IEEE754 2019, at least.

EDIT: I hadn't noticed that sfalker said the exact same thing as my first point >_>

lygstate commented 4 years ago

Basically, use the version that's most efficient for your processor. On modern 64-bit processors with wide FPUs and/or 256-bit SIMD this would become f64.

Even on modern x86 which has similar or equal speed between most f32 and f64 ops, f32 is still very much the fastest for your processor because it cuts cache misses in half.

Sometimes I want to be able to have a variable for real numbers, or I don't know what precision I want yet. In C++ I can do the following to have an abstract precision that I control via compiler flags:
#[cfg(real_is_f64)]
type real = f64;
#[cfg(not(real_is_f64))]
type real = f32;
then you can control via RUSTFLAGS="--cfg real_is_f64" (you can also use cargo features, but they're not a great fit for cases where enabling a feature can cause compile errors like this)

... Regarding suggestions of f80

What would f80 do on platforms that aren't x86? Noting else has native 80bit floats. It's not even part of IEEE754 (even though it's largely natural extension of it... although it has a lot of quirks). This is something that would be viable in core::arch::{x86,x86_64} but isn't portable. We don't want to have to implement these as software floats on other platforms.

I'd be in favor of a std::os::raw::c_long_double type but it would have to be carefully designed. Note that PPC's long double is exceptionally cursed, as it's a pair of doubles that are summed together...

I'd be in favor of f16, and tentatively f128 since binary128 is part of IEEE754 2019, at least.

We have a fact that f80 are broadly used, and in forseable future, that's will continue. We have no need a soft f80 impl, just make on x86 platfrom f80 works is enough. Anyway a soft f80 may be a better option for cross platform consideration.

programmerjake commented 4 years ago

several architectures have hardware support for f128: RISC-V, PowerPC, s390, and probably more.

lygstate commented 4 years ago

several architectures have hardware support for f128: RISC-V, PowerPC, s390, and probably more.

For platform have f128, implmenet f80 would not cause significant performance down

aaronfranke commented 4 years ago

@thomcc These are all ideas, not everything in the OP is relevant anymore since it has been discussed. I think fsize and freal have been discussed and dismissed, fsize is a bad idea considering the information in this thread and freal is indeed easy to implement with a small amount of lines of code so it doesn't need to be in the language.

That said, f128 is still desired for sure and has some use cases and some hardware support, f80 would be neat though I wouldn't use it personally, f16 would be useful especially in the context of low-end graphics though I also wouldn't use this myself, and if your goal is to cover IEEE 754 there is also f256 or octuple precision, though it's rare to see.

lygstate commented 4 years ago

@thomcc These are all ideas, not everything in the OP is relevant anymore since it has been discussed. I think fsize and freal have been discussed and dismissed, fsize is a bad idea considering the information in this thread and freal is indeed easy to implement with a small amount of lines of code so it doesn't need to be in the language.

That said, f128 is still desired for sure and has some use cases and some hardware support, f80 would be neat though I wouldn't use it personally, f16 would be useful especially in the context of low-end graphics though I also wouldn't use this myself, and if your goal is to cover IEEE 754 there is also f256 or octuple precision, though it's rare to see.

may be we can add f16 f80 and f128 in a single shot?

workingjubilee commented 4 years ago

f16 has uses in neural networks as well.

There are actually many problems with using f80, especially if we do not ship a soft float to cover it... it would not be a type defined by an abstraction, frankly, it would be a type defined by Intel's hardware quirks, and we would only be adding more on top of it. One of the nice things about Rust is that it is highly portable right now, so I do not think it makes sense to add such a non-portable type to the language and limit portability that much, though a language extension that makes it simpler to define and use such a non-portable type would make sense.

thomcc commented 4 years ago

several architectures have hardware support for f128: RISC-V, PowerPC, s390, and probably more.

I can't say for sure about the other arches, but PowerPC's is not IEEE-754-like at all — it's double-double. It would not help for implementing a sane f128 nor would it help implement a f80.

For platform have f128, implmenet f80 would not cause significant performance down

I don't think this is really true (we can quibble over significant, I guess), but regardless rust doesn't exclusively target architectures in the sets {have native f80}, {have native f128}, so something that solves this for other architectures needs to be considered.

if your goal is to cover IEEE 754 there is also f256 or octuple precision, though it's rare to see.

I mean, it's not mentioned in IEEE-754 2019. It's not hard to imagine what it looks like, admittedly.

Anyway, I think once inline asm is stable someone who really wants f80 could implement it as a library on x86/x86_64. This wouldn't solve the issue of FFI (e.g. a c_long_double type), which I still think would be nice to solve, but I think has a lot of different design considerations, could just be a mostly-opaque type that includes little more than implementations of From<f64>/Into<f64> (e.g. no arithmetic).

programmerjake commented 4 years ago

@thomcc

several architectures have hardware support for f128: RISC-V, PowerPC, s390, and probably more.

I can't say for sure about the other arches, but PowerPC's is not IEEE-754-like at all — it's double-double. It would not help for implementing a sane f128 nor would it help implement a f80.

You're thinking of C's long double type; PowerPC does support IEEE-754 standard binary128 FP using new instructions added in Power ISA v3.0. Quoting GCC 6's change log:

PowerPC64 now supports IEEE 128-bit floating-point using the float128 data type. In GCC 6, this is not enabled by default, but you can enable it with -mfloat128. The IEEE 128-bit floating-point support requires the use of the VSX instruction set. IEEE 128-bit floating-point values are passed and returned as a single vector value. The software emulator for IEEE 128-bit floating-point support is only built on PowerPC GNU/Linux systems where the default CPU is at least power7. On future ISA 3.0 systems (POWER 9 and later), you will be able to use the -mfloat128-hardware option to use the ISA 3.0 instructions that support IEEE 128-bit floating-point. An additional type (ibm128) has been added to refer to the IBM extended double type that normally implements long double. This will allow for a future transition to implementing long double with IEEE 128-bit floating-point.

thomcc commented 4 years ago

Thanks, you're correct that I was thinking of the PPC long double (__ibm128) type. Unfortunately, I think the existence of 2 separate 128-bit "floating point" types on powerpc only complicates things, although it's nice that at least one of them is moderately sane.

eprovst commented 3 years ago

Full(er) support for IEEE 754 would indeed be very welcome, especially for numerical work.

What would f80 do on platforms that aren't x86? Noting else has native 80 bit floats. It's not even part of IEEE 754 (even though it's largely natural extension of it... although it has a lot of quirks).

This is somewhat false, x86's 80-bit floats are extended precision binary64's as specified by IEEE 754.

However it's true that these are not very strictly defined, an extended precision binary64 has to have a larger precision than binary64 and the exponent range of binary128. This means that both x86's 80-bit floats and binary128 are examples of valid extended precision binary64's.

I'd suggest providing the following types:
f16 (binary16), f32 (binary32), f64 (binary64), f64e (binary64 extended) and f128 (binary128).

On x86 platforms, and others that have a native extended precision binary64, a f64e would be an 80-bit float or similar, on all others it would be the same as a f128.

[Edit: further clarified in the relation between 80-bits floats and IEEE 754.]

workingjubilee commented 3 years ago

So, on the other side of "portable" is "layout". We have a lot of ambiguous-layout types which are not primitive types. However, as far as I am aware all the primitive types have a pretty explicit layout, and many of the std composite data types like Vec etc. have most of their layout dialed in as well. Here we'd have two possible layouts on a numeric type which should be as simple as possible, andf64e is probably the wrong abstraction here because there's a lot of cases where someone wants "type N that fulfills X or else type M that fulfills a superset of X", especially for math libs.

eprovst commented 3 years ago

I'm not too sure what you mean by 'layout' in this case, it's true that extended precision floats do not have to conform to a certain bit format. If you refer to the memory layout of complex data types, I'm not sure if there are any guarantees here anyway as I wouldn't be surprised optimisation passes can and do change these kinds of layouts.

I didn't give much thought to the syntax of f64e, something like ExtendedPrecision<f64> might indeed be the better choice here, which also neatly extends to the other fxx's.

Most do seem to agree on including all the common IEEE 754 types, which is, I think, the main goal of this issue. Something similar to Fortran's selected_real/integer_kind could also be looked at, but should probably be moved to another issue.

I'd have to check Rust's current support for other parts of IEEE 754 first. There are very few languages with good support for the hardware's capabilities in this area and those that do tend to be rather unsafe. Numerical analysis and other scientific computing do seem to be a great fit for Rust, so I think it's worth looking into this.

[Edit: typos and clarification]

programmerjake commented 3 years ago

I would expect f64e to be directly equivalent in bit representation, ABI, and layout to C/C++'s long double except in cases like MSVC on x86_64 where they pick long double == double even though f80 is still usable from a hardware level. There would be another type alias c_long_double for exact equivalence to long double on all platforms with an ABI-compatible C compiler and when the long double type is supported by Rust (so, probably excluding PowerPC's annoying double-double type for the MVP).

One interesting side-note: PowerPC v3.0 includes an instruction for converting float types to f80, though I think that's the only supported operation.

f128 would be directly equivalent to gcc/clang's __float128 type where supported.

programmerjake commented 3 years ago

One interesting side-note: PowerPC v3.0 includes an instruction for converting float types to f80, though I think that's the only supported operation.

Turns out that the only supported f80 operation is xsrqpxp, which rounds a f128 to a f80 but leaves it in f128 format, that's useful for implementing f80 arithmetic operations, since, for all of add, sub, mul, div, and sqrt, if all inputs are known to be f80 values in f128 format, then you can produce the exact result f80 value in f128 format by:

run the add, sub, mul, div, or sqrt operation for f128 in round to odd mode
run the xsrqpxp instruction in the desired rounding mode for the f80 operation

This is similar to how f32 arithmetic can be implemented in JavaScript (which only has the f64 type for arithmetic) by rounding to f32 between every operation.

eprovst commented 3 years ago

[...] that's useful for implementing f80 arithmetic operations [...]

No need to, ExtendedPrecision<f64> would simply be f128 on targets that do not have a native extended double format.

In many languages computations with floating point numbers aren't guaranteed to be identical on different targets. On x86_64, for instance, doubles were/are often stored in 80-bit registers, it's only when they are written to memory that they are truncated to 64 bits. In strict mode the JVM thus has to write every floating point value back to memory between operations to guarantee identical results on different architectures.

[Edit: formulation was ambiguous.]

moonheart08 commented 3 years ago

@elecprog x87 is no longer the normal case. They're stored as is in SIMD registers, x87 has been out of use for over a decade. SIMD directly operates on 64-bit and 32-bit floats.

programmerjake commented 3 years ago

@elecprog on x86_64 both f32 and f64 are defined by the ABI to be stored in SSE registers and not in the x87 stack. on x86 32-bit they can be stored on the x87 stack.

workingjubilee commented 3 years ago

Computations are not guaranteed to be identical on different targets anyway.

This is a somewhat misleading statement, because not only does it depend on what you next say (and others have discussed its incorrectness), but in actuality the vast majority of targets and especially modern targets do give identical computations with most inputs, such that if you know what you are doing you can in fact even make exact comparisons across the vast majority of targets. Rust even allows you to easily do this because its semantics around floats are, in spite of some issues, currently fairly predictable compared to many other languages.

eprovst commented 3 years ago

I mostly meant that ExtendedPrecision<f64> shouldn't guarantee identical results across targets. You're asking for extra precision, preferably with as little runtime cost as possible. The example I gave, albeit apparently horribly outdated, meant to illustrate that it's generally considered perfectly acceptable to have more precision than you asked for.

I see little point in guaranteeing anything more about a float than it's minimum precision and exponent size outside of the interface level. If you're relying on floats having some specific precision you probably shouldn't be using them...

Of course, if it's possible to do without incurring any hidden costs, go for it, but I'd be weary of baking the results of floating point computations into the language's semantics.

(If you really need those strict semantics you could use an hypothetical Strict<f64>.) [Edit: typo.]

lygstate commented 3 years ago

I'm not too sure what you mean by 'layout' in this case, it's true that extended precision floats do not have to conform to a certain bit format. If you refer to the memory layout of complex data types, I'm not sure if there are any guarantees here anyway as I wouldn't be surprised optimisation passes can and do change these kinds of layouts.

I didn't give much thought to the syntax of f64e, something like ExtendedPrecision<f64> might indeed be the better choice here, which also neatly extends to the other fxx's.

Most do seem to agree on including all the common IEEE 754 types, which is, I think, the main goal of this issue. Something similar to Fortran's selected_real/integer_kind could also be looked at, but should probably be moved to another issue.

I'd have to check Rust's current support for other parts of IEEE 754 first. There are very few languages with good support for the hardware's capabilities in this area and those that do tend to be rather unsafe. Numerical analysis and other scientific computing do seem to be a great fit for Rust, so I think it's worth looking into this.

[Edit: typos and clarification]

Maybe Narrow<f128> are a better option, fp80 and also IBM powerpc fp128_ppc_ibm_extended(represent in two fp64) are a narrowed version of IEEE fp128. They both can be compute in f128 and the narrow convert to fp80 or fp128_ibm_extended, they are only used to compat existing code or binary(ABI compat), for new code, always suggest use f128 directly, f128 will be optimized with hardware support if possible, otherwise, a pure soft version will provided.

Conclusion, use f128, fp80 and fp128_ppc_ibm_extended only when compute precision are more important, not about the performance, so the soft version are acceptable.

f128 are the IEEE standard version that provided across all platform, fp80 only provided on x86 platform, and fp128_ppc_ibm_extended only provided on powerpc platform. maybe there are also other version of Narrow<fp128>, do not know that.

eprovst commented 3 years ago

The main use case of ExtendedPrecision<T> would be when you (temporarily) need a bit more precision, but not necessarily the full blown next largest IEEE 754 standard float. The compiler thus should select the fastest option that corresponds to some extended precision variant of T as specified by the IEEE 754 standard. For f64 on x86_64 this would be 80-bit, on an architecture with native support for quad precision this would probably be the same as f128, etc. On platforms with no native support for anything like that you use some soft alternative. That's why there should be limited guarantees about the resulting type, as you want the fastest option.

When you're working on FFIs it's of course a different story, however as these things are vendor specific I suppose they should go under core::arch, but I'm not sure about that.

I feel like scientific computing is really a field where Rust can shine, rich support for IEEE is rare in languages and Rust's already pretty good.

workingjubilee commented 3 years ago

I have done some thinking about this and an RFC that proposes f16 and f128 will still have to address how to handle those types on platforms which Rust targets but that do not support them, and that such is likely to be the main blocker. I believe that any such proposal might incidentally also solve for good ways to handle "long doubles".

lygstate commented 3 years ago

The main use case of ExtendedPrecision<T> would be when you (temporarily) need a bit more precision, but not necessarily the full blown next largest IEEE 754 standard float. The compiler thus should select the fastest option that corresponds to some extended precision variant of T as specified by the IEEE 754 standard. For f64 on x86_64 this would be 80-bit, on an architecture with native support for quad precision this would probably be the same as f128, etc. On platforms with no native support for anything like that you use some soft alternative. That's why there should be limited guarantees about the resulting type, as you want the fastest option.

When you're working on FFIs it's of course a different story, however as these things are vendor specific I suppose they should go under core::arch, but I'm not sure about that.

I feel like scientific computing is really a field where Rust can shine, rich support for IEEE is rare in languages and Rust's already pretty good.

These is something choose between implement fast and implement correct, I think people using long double are looking for precision/correctness over the performance, So implement correct were more important, If people want performance then they should always choose double over long double, using long double always want more precision and not the performance.

eprovst commented 3 years ago

These is something choose between implement fast and implement correct, I think people using long double are looking for precision/correctness over the performance, So implement correct were more important, If people want performance then they should always choose double over long double, using long double always want more precision and not the performance.

long double is specified to have at least as much precision as double, which means that it could very well be only a double (as is the case for some C compilers).

Something like ExtendedPrecision<f64> should guarantee to be bigger even if that means you lose some performance, but it should try to give you the fastest option. There's no point in implementing ExtendedPrecision<f64> on targets with support for quadruple precision as soft 80-bit floats. You would then lose performance to lose precision.

lygstate commented 3 years ago

These is something choose between implement fast and implement correct, I think people using long double are looking for precision/correctness over the performance, So implement correct were more important, If people want performance then they should always choose double over long double, using long double always want more precision and not the performance.

long double is specified to have at least as much precision as double, which means that it could very well be only a double (as is the case for some C compilers).

Something like ExtendedPrecision<f64> should guarantee to be bigger even if that means you lose some performance, but it should try to give you the fastest option. There's no point in implementing ExtendedPrecision<f64> on targets with support for quadruple precision as soft 80-bit floats. You would then lose performance to lose precision.

Yes, it's performance vs precision, for example, if people have tests running on x86 platform, they wanna f80 result, but when trunc to f64, those tests would failed and make a burden for people migrating existing code to rust, becase people don't know which test case would fail and which code would fail and need to be revised when migrating to rust at the frist step. But for performance, people always have general tools to monitor that.

ghost commented 3 years ago

Rust already has int128 , float 128 bit should be fine if provided.

How did the core Rust team think about it?

cesss commented 3 years ago

I'm taking my first look at Rust (proficient in C and C++, but new to Rust), and I must admit that the first thing I checked was the builtin types, and the second one if 128bit IEEE was a builtin type. In my case, being able to build floating point code with 128bit precision is very useful, no matter if it's emulated in software, because it helps you find out how "healthy" your floating point code is, accuracy-wise.

moonheart08 commented 3 years ago

Rust already has int128 , float 128 bit should be fine if provided.

How did the core Rust team think about it?

f128 would be hard to put into libcore due to the fact it requires a supporting library to function, which is counter to the language core's design.

CryZe commented 3 years ago

floats already need libm to function (in many ways). Also compiler-builtins are needed too. So that statement seems wrong.

ecnelises commented 3 years ago

LLVM will translate float128 (fp128 in IR) operations into emulation calls if the target doesn't support such native instructions, and the functions are provided by compiler-rt. The real issue seems libm, because not every target has math functions with f128 suffix.

Here's a doc from GCC, about IEEE float-128 status on PowerPC, referable: https://gcc.gnu.org/wiki/Ieee128PowerPC

Kleptine commented 3 years ago

One of the things that is great about Rust's primitive types is that they are reliably sized. An i32 is always 32 bits. If you explicitly desire a platform specific size, you can reach for it, but it's not necessarily the default.

Similarly, I'd prefer a f80 type, which is always 80 bits, and either doesn't exist on un-supported platforms, or is emulated. An f64e type that is platform defined is not nearly as useful if you want strong guarantees of memory packing (ie. fitting in cache lines, etc).

That said, I'd be perfectly happy to see this type supplied by a library. I'm not sure it's frequently useful enough to be in core.

lygstate commented 2 years ago

Standard f128(IEEE 754R) full support is need, for f128e(IBM extended double) and f80e we only need add/sub/div/mul support and ABI consistence with gcc

According https://gcc.gnu.org/wiki/Ieee128PowerPC

Type	Size in bytes	Exponent	Mantissa	Rough minimum	Rough maximum	Decimal digits
IBM extended double	16 bytes	11 bits	106 bits (53 + 53)	1.0E−323L	1.0E+308L	30-34
IEEE 754R 128-bit	16 bytes	15 bits	113 bits	1.0E-4932Q	1.0E+4932Q	33-36
Intel 80-bit long double	12/16 bytes (*)	15 bits	63 bits + leading 1 bit	1.0E-4932W	1.0E+4932W	19-21

workingjubilee commented 2 years ago

floats already need libm to function (in many ways). Also compiler-builtins are needed too. So that statement seems wrong.

It's half-correct. That is why some float methods aren't really callable in core.

I have done some more thinking and it is my opinion that Rust should not support either "improper" "long double" format as a proper "language type", neither the 80-bit "long double" or the extended "double double", but should support the binary128 as f128 fully. However, we could attempt to offer FFI compatibility with the "long double", even if we eschew operations directly on those types in Rust.

So, my idea is that core::arch::x86::long_double and core::arch::powerpc::long_double types could be defined in core::arch and offer a conversion to one of Rust's natural types, f128 or f64, and likely also a simple {to,from}_bits operation for getting a u128 out of them. This would allow us to extract values from the arch-specific C "long double" and do operations on a well-defined-by-Rust-and-IEEE754 floating point type. This minimizes the internal support requirements of Rust while allowing us to accept it in C FFI, maximizing our compatibility.

I think there is no real advantage gained from supporting the Intel or PowerPC long doubles as "true" Rust language types, in the sense of "lives outside of core::arch and/or the Rust libc bindings." All of the x87 FPU operations in modern processors are defined in microcode, and have gotten slower over the years (gaining more cycles to execute, though more cycles are completed in less time nowadays), because AMD and Intel have overwhelmingly focused on their support of the Streaming SIMD Extensions 2 architectural extension, due to it being included as a basic requirement in the spec for AMD64. As far as PowerPC, POWER9 now simply supports f128, that is, a true binary128, in hardware.

We can likely offer adequate, possibly even good performance for the f128 type in software emulation, simply by focusing on making that run well. And I think supporting f128 alone will be extremely challenging because of the same kinds of issues that we already encounter with i128 and u128 (that is, GCC and Clang and Rust have to all agree on what their ABI is, and it's not always obvious what we should all agree on) as well as just our support for floating point in general not being great.

If all of the above was achieved, this would also likely allow a user to simply write a library supporting direct operations on those arch-specific types with core::arch::asm! if they really want, though I imagine most would be quite happy to just convert to f128 and back. But I offer this minimizing proposal partly because I consider Rust's floating point support to be less than ideal as-is (even if I think it is surprisingly good), and even if more resources were made available for doing so, I don't think we would want to split our support resources for "big" floats between three different types unless there was an enormous grant from Intel, AMD, and IBM to do so... and I imagine they would rather fund us finishing and rounding out support for f128 and their SIMD intrinsics first.

lygstate commented 2 years ago

floats already need libm to function (in many ways). Also compiler-builtins are needed too. So that statement seems wrong.

It's half-correct. That is why some float methods aren't really callable in core.

I have done some more thinking and it is my opinion that Rust should not support either "improper" "long double" format as a proper "language type", neither the 80-bit "long double" or the extended "double double", but should support the binary128 as f128 fully. However, we could attempt to offer FFI compatibility with the "long double", even if we eschew operations directly on those types in Rust.

So, my idea is that core::arch::x86::long_double and core::arch::powerpc::long_double types could be defined in core::arch and offer a conversion to one of Rust's natural types, f128 or f64, and likely also a simple {to,from}_bits operation for getting a u128 out of them. This would allow us to extract values from the arch-specific C "long double" and do operations on a well-defined-by-Rust-and-IEEE754 floating point type. This minimizes the internal support requirements of Rust while allowing us to accept it in C FFI, maximizing our compatibility.

I think there is no real advantage gained from supporting the Intel or PowerPC long doubles as "true" Rust language types, in the sense of "lives outside of core::arch and/or the Rust libc bindings." All of the x87 FPU operations in modern processors are defined in microcode, and have gotten slower over the years (gaining more cycles to execute, though more cycles are completed in less time nowadays), because AMD and Intel have overwhelmingly focused on their support of the Streaming SIMD Extensions 2 architectural extension, due to it being included as a basic requirement in the spec for AMD64. As far as PowerPC, POWER9 now simply supports f128, that is, a true binary128, in hardware.

We can likely offer adequate, possibly even good performance for the f128 type in software emulation, simply by focusing on making that run well. And I think supporting f128 alone will be extremely challenging because of the same kinds of issues that we already encounter with i128 and u128 (that is, GCC and Clang and Rust have to all agree on what their ABI is, and it's not always obvious what we should all agree on) as well as just our support for floating point in general not being great.

If all of the above was achieved, this would also likely allow a user to simply write a library supporting direct operations on those arch-specific types with core::arch::asm! if they really want, though I imagine most would be quite happy to just convert to f128 and back. But I offer this minimizing proposal partly because I consider Rust's floating point support to be less than ideal as-is (even if I think it is surprisingly good), and even if more resources were made available for doing so, I don't think we would want to split our support resources for "big" floats between three different types unless there was an enormous grant from Intel, AMD, and IBM to do so... and I imagine they would rather fund us finishing and rounding out support for f128 and their SIMD intrinsics first.

Good idea, thats make sense

tgross35 commented 2 years ago

Just asking as a curious observer - has an official RFC for this gotten any movement? The only pre-rfc I can find is this one which has been long closed.

Recently stumbled into the pain of varying double/long double support in c and was wondering if rust outdoes it

hamdav commented 2 years ago

I would just like to say that I would love to have f128 support in rust as well. It can be useful, and even necessary, for some scientific computations.

VariantXYZ commented 1 year ago

Opened an issue without realizing f16 was covered here.

I think there are plenty of reasons to support f16 as a native arithmetic type in Rust, but my primary use-case is ML inference for hardware that supports fp16 arithmetic (e.g. the Cortex-a55).

I've resorted to writing simple functions (multiply - add, dot products, etc...) to operate on _Float16 values in C and calling them because the half crate's conversion cost is really painful for anything low-latency/high frequency (audio processing). It is... far from efficient.

My understanding is that _Float16 is a portable arithmetic type, defined in the C11 extension ISO/IEC TS 18661-3:2015), so it would be nice if Rust exposed something similar.

aaronfranke commented 1 year ago

On the topic of hardware support, I'll add that RISC-V's Q extension provides quadruple-precision floats, so if Rust added f128 then it could be hardware accelerated on those systems, for example rv64gqc systems.

Even without hardware acceleration, it would still be useful to have this much precision available via software emulation at the language and standard library level.

rust-lang / rfcs

Suggestions for additional floating-point types #2629