servo / pathfinder

A fast, practical GPU rasterizer for fonts and vector graphics
Apache License 2.0
3.52k stars 198 forks source link

fix f32x4_basic_ops test #551

Closed jonaspleyer closed 4 months ago

jonaspleyer commented 4 months ago

The test function body is probably incorrect. The right-hand side produces values which are incorrect in the CI run as well. Everything else seems to be fine.

This can be verified when using only the simd intrinsics as defined in the core library.

fn main() {
    #[cfg(target_arch = "x86")]
    use std::arch::x86::*;
    #[cfg(target_arch = "x86_64")]
    use std::arch::x86_64::*;

    unsafe {
        let values = _mm_set_ps(1.0, 3.0, 5.0, 7.0);
        let res = _mm_rcp_ps(values);
        println!("{:?}", values);
        println!("{:?}", res);
    }
}

produces

__m128(7.0, 5.0, 3.0, 1.0)
__m128(0.14282227, 0.19995117, 0.333313, 0.99975586)

Notice that the order in the output is different compared to its specification inline but the result is correct.

We obtain the same results when using this small C programm:

#include <stdio.h>

#ifdef __SSE2__
    #include <emmintrin.h>
#else
    #warning SSE2 suport is not available. Code will not compile
#endif

int main() {
    __m128 a = _mm_set_ps(1.0, 3.0, 5.0, 7.0);
    __m128 b = _mm_rcp_ps(a);

    float c[4];
    _mm_storeu_ps(c, b);

    printf("%f %f %f %f\n", c[0], c[1], c[2], c[3]);
    return 0;
}

Compile with gcc and run

$ gcc main.c -o main
$ ./main
>> 0.142822 0.199951 0.333313 0.999756
s3bk commented 4 months ago

Floating point math is fun ...