openhwgroup / cvfpu

Parametric floating-point unit with support for standard RISC-V formats and operations as well as transprecision formats.
Apache License 2.0
432 stars 115 forks source link

CVFPU incorrectly suppresses overflow flag on `I2F` conversions #123

Open michael-platzer opened 5 months ago

michael-platzer commented 5 months ago

According to IEEE 754-2008, converting an integer to a floating-point value should trigger an overflow exception in case the rounded value exceeds the range of the floating-point type.

image

However, CVFPU suppresses the overflow flag when the source operand is an integer (i.e., on I2F conversions), and sets the invalid flag instead:

https://github.com/openhwgroup/cvfpu/blob/81c53c5381f7438272c05025ee3265d752f96e02/src/fpnew_cast_multi.sv#L721-L727

The comment on line 721 is incorrect: an overflow should not trigger an invalid exception on I2F conversions.

I believe this has been confused with the opposite case, converting a floating-point value to an integer, which indeed is not supposed to produce an overflow exception:

image

For single-precision and double-precision types an overflow cannot happen, because the exponent range is always large enough such that any 32-bit or 64-bit integer value does not exceed it. However, for half-precision values an overflow can occur on I2F conversions.

@lucabertaccini @pascalgouedo @stmach Please let me know whether my assessment is correct. I will then move forward with a PR to fix the behavior.

pascalgouedo commented 1 month ago

Hi @michael-platzer As long as it doesn't change the behavior when using only Single-Precision format, it should be fine. But you will have to prove it 😀.

michael-platzer commented 1 month ago

Hi @pascalgouedo, I assume you are referring to the changes in PR #125. Those do change the behavior: they replace the current non-IEEE-compliant behavior with IEEE-compliant behavior.

The reason you are not seeing this issue when using only the single-precision format is that there can be no overflow on I2F conversions for this format. This is easy to prove: the largest normal value that can be represented by a single-precision floating-point number is $2^{127} × (2 − 2^{−23})$, which is way larger than any value that can be represented by a 32 or 64-bit integer number.