stencillogic / astro-float

Arbitrary precision floating point numbers library
MIT License
101 stars 4 forks source link

Proposal for a public `BigFloat::to_f64` #11

Open Alextopher opened 1 year ago

Alextopher commented 1 year ago

I have a use case where I need to go back and forth between BigFloat and f64. fn to_f64(&self) -> f64 would be perfect.

For context there is an existing pub(crate) implementation. https://github.com/stencillogic/astro-float/blob/00d5150dabcd578b89e83ab9f34f1605568f5a7e/astro-float-num/src/num.rs#L795.

I would like to propose cleaning up this method and tweak it to follow Rust's convention with casting between f32 and f64. According to the nomicon

  • Casting from an f32 to an f64 is perfect and lossless
  • Casting from an f64 to an f32 will produce the closest possible f32
    • if necessary, rounding is according to roundTiesToEven mode ***
    • on overflow, infinity (of the same sign as the input) is produced

*** as defined in IEEE 754-2008 §4.3.1: pick the nearest floating point number, preferring the one with an even least significant digit if exactly halfway between two floating point numbers.

Here the same rules can be followed.

I am open to working on this contribution.

stencillogic commented 1 year ago

Yes, please, feel free to make this change. The reason why this function is only crate-public is exactly because it requires proper rounding and transition to subnormal. I would though consider the following signature for the function: pub fn to_f64(&self, rm: RoundingMode) -> f64, so user would be able to choose how to round.

atrabattoni commented 8 months ago

I would also find such feature very usefull. For those who would need a workaround, below a snippet slightly adapted from the code mentioned above.

use astro_float::{BigFloat, RoundingMode, Sign};

fn to_f64(big_float: &BigFloat, rounding_mode: RoundingMode) -> f64 {
    let mut big_float = big_float.clone();
    big_float.set_precision(64, rounding_mode).unwrap();
    let sign = big_float.sign().unwrap();
    let exponent = big_float.exponent().unwrap();
    let mantissa = big_float.mantissa_digits().unwrap()[0];
    if mantissa == 0 {
        return 0.0;
    }
    let mut exponent: isize = exponent as isize + 0b1111111111;
    let mut ret = 0;
    if exponent >= 0b11111111111 {
        match sign {
            Sign::Pos => f64::INFINITY,
            Sign::Neg => f64::NEG_INFINITY,
        }
    } else if exponent <= 0 {
        let shift = -exponent;
        if shift < 52 {
            ret |= mantissa >> (shift + 12);
            if sign == Sign::Neg {
                ret |= 0x8000000000000000u64;
            }
            f64::from_bits(ret)
        } else {
            0.0
        }
    } else {
        let mantissa = mantissa << 1;
        exponent -= 1;
        if sign == Sign::Neg {
            ret |= 1;
        }
        ret <<= 11;
        ret |= exponent as u64;
        ret <<= 52;
        ret |= mantissa >> 12;
        f64::from_bits(ret)
    }
}
atrabattoni commented 8 months ago

Actually it would be nice to have conversion for all primitives with similar behaviour as when using the as keyword. (e.g., negative f64 as u64 gives 0). We could implement the From/Into traits. An example for u64: (I didn't test for nan and inf yet):

impl From<BigFloat> for u64 {
    fn from(value: BigFloat) -> u64 {
        let sign = value.sign().unwrap();
        let exponent = value.exponent().unwrap();
        let mantissa = value.mantissa_digits().unwrap()[0];
        match sign {
            Sign::Pos => {
                if exponent > 0 {
                    if exponent <= 64 {
                        let shift = (64 - exponent) as u64;
                        let ret = mantissa;
                        ret >> shift
                    } else {
                        u64::MAX
                    }
                } else {
                    0
                }
            }
            Sign::Neg => 0,
        }
    }
}
tgross35 commented 8 months ago

Agreed that some generic way could be nice here, but I don't know if From/Into is the right thing. Those are meant for lossless conversions, e.g. the below does not work:

let a: f32 = 10.0f64.into();

I still think that generics are a good idea, since if we are lucky we will have f16 and f128 at some point. Maybe a signature using num_traits like:

impl BigFloat {
    fn as_float<T: Float>(&self, rounding_mode: RoundingMode) -> T;
}
atrabattoni commented 7 months ago

Yes you are right using the From/Into for something that is not perfectly converted wouldn't follow rust idioms.

if we can have a as_float but also a as_int (maybe also a as_uint) that would be awesome. If this complicates things, having one function per type (i.e., to_f64, to_u64, ... ) as initially proposed won't bother me that much neither.

atrabattoni commented 7 months ago

BTW: The solution I posted above only work for 64 bits architecture, I'm sure you know better than I how to fix this. Hopefully I won't encounter users that still very old computers :) (though if we can have an official well tested functions that would be better).

stencillogic commented 7 months ago

Regarding implementing conversion to primitive integer types, do you have actual use cases for it?

Issues that I can think of: what would be the behavior if the value exceeds the maximum allowed value for the primitive type? Return error and let client code decide, or clamp to max value? With clamping you can encounter situation when you want to know if it was clamped, and have a way to determine it somehow?

Seems like a good idea to have, for example, both kinds of functions: the one that returns error and the other is saturating, e.g. to_u64_saturating which does clamping.

stencillogic commented 7 months ago

New issue #28 has been opened. Please use it for further discussion.