IEEE-754 floating point numbers in V

Le0X8 commented 10 months ago

Describe the feature

This feature contains of two parts:

.parse_float() method on []u8 to parse a float from a byte array.

I wrote this code, which works fine:

import math { pow, powi }

// m_len is 24 when using f32
// m_len is 52 when using f64
fn parse_float(buffer []u8, is_le bool, m_len int) !f64 {
    mut e_len := (buffer.len * 8) - m_len - 1
    mut e_max := (1 << e_len) - 1
    mut e_bias := e_max >> 1
    mut n_bits := -7
    mut i := 0
    if is_le {
        i = buffer.len - 1
    }
    mut d := 1
    if is_le {
        d = -1
    }
    mut s := buffer[i]

    i += d

    mut e := int(s & ((1 << (-n_bits)) - 1))
    s >>= u8(-n_bits)
    n_bits += e_len
    for ; n_bits > 0; {
        e = (e * 256) + buffer[i]
        i += d
        n_bits -= 8
    }

    mut m := int(e & ((1 << (-n_bits)) - 1))
    e >>= u8(-n_bits)
    n_bits += m_len
    for ; n_bits > 0; {
        m = (m * 256) + buffer[i]
        i += d
        n_bits -= 8
    }

    if e == 0 {
        e = 1 - e_bias
    } else if e == e_max {
        return error('Unparseable number')
    } else {
        m += int(powi(2, m_len))
        e -= e_bias
    }

    if 0 != s {
        return -m * pow(2, e - m_len)
    }

    return m * pow(2, e - m_len)
}

.bytes() method on f32 and f64 to convert them to byte arrays.

This function cannot be implemented in V yet, because V doesn't support bitwise operations on floating point numbers, but this is necessary to implement the IEEE-754 standard.

fn write(value f64, n_bytes int, is_le bool, m_len int) []u8 {
    // not implementable
}

Use Case

Implementing the IEEE-754 standard in V is required to read and write floats in binary file formats correctly. Because of the current lack of bitwise operations on floats, this cannot be done safely in V and writing results would be inaccurate.

I'm currently working on a Buffer API for V, which should make binary data handling in V much easier and should hide all the math behind it, for which this feature is very necessary to be implemented in V.

Also, if you want to parse and write embeddings of AI models, this feature is critically necessary, because they are often in a Float32Array format. This could result in corrupted model training data and could lead to AI devs staying away from V.

Proposed Solution

see feature description

Other Information

IEEE-754 standard Wikipedia article

Acknowledgements

[x] I may be able to implement this feature request
[ ] This feature might incur a breaking change

Version used

V 0.4.3 2964855

Environment details (OS name and version, etc.)

V full version: V 0.4.3 2964855 OS: linux, Debian GNU/Linux 12 (bookworm) Processor: 8 cpus, 64bit, little endian, Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz

getwd: /home/leo/Documents/ieee754 vexe: /opt/v/v vexe mtime: 2023-11-27 17:33:39

vroot: OK, value: /opt/v VMODULES: OK, value: /root/.vmodules VTMP: OK, value: /tmp/v_0

Git version: git version 2.39.2 Git vroot status: weekly.2023.45.1-154-g2964855d (169 commit(s) behind V master) .git/config present: true

CC version: cc (Debian 12.2.0-14) 12.2.0 thirdparty/tcc status: thirdparty-linux-amd64 12f392c3

[!NOTE] You can use the 👍 reaction to increase the issue's priority for developers.

Please note that only the 👍 reaction to the issue itself counts as a vote. Other reactions and those to comments will not be taken into account.

JalonSolov commented 10 months ago

V usually has separate functions for each type, so instead of a single parse_float, it would be parse_f32 and parse_f64.

The "parsing" you're doing is super-simple if you just use a union. No extra math required... just write the float into the float field in the union, and read the bytes out of the byte array.

You'll also need to pay attention to big-endian vs little-endian.

Le0X8 commented 10 months ago

@JalonSolov yes, this is absolutely correct. I've implemented this in V, and it works :)

I just can't figure out how I write/read big endian floats.

My code so far:

pub fn read_f32le(bytes [4]u8) f32 {
    f := F32A{
        value: bytes
    }

    unsafe {
        return f.f
    }
}

pub fn write_f32le(f f32) [4]u8 {
    bytes := F32A{
        F32: F32{
            f: f
        }
    }

    unsafe {
        return bytes.value
    }
}

pub fn read_f64le(bytes [8]u8) f64 {
    f := F64A{
        value: bytes
    }

    unsafe {
        return f.f
    }
}

pub fn write_f64le(f f64) [8]u8 {
    bytes := F64A{
        F64: F64{
            f: f
        }
    }

    unsafe {
        return bytes.value
    }
}

JalonSolov commented 10 months ago

Write the float, then call https://modules.vlang.io/index.html#array.reverse on the u8 array. :-)

JalonSolov commented 10 months ago

One other thing you may need to take into account... the code you have works as long as V is running on a little-endian system. Everything is reversed if you're running on a big-endian system.

You can check that with https://modules.vlang.io/runtime.html#is_little_endian or https://modules.vlang.io/runtime.html#is_big_endian whichever you prefer.

penguindark commented 10 months ago

Converting IEE-754 from and to strings is not a trivial matter: https://github.com/vlang/v/blob/master/vlib/strconv/atof.c.v https://github.com/vlang/v/blob/master/vlib/strconv/f32_str.c.v 😸

vlang / v