intel / ARM_NEON_2_x86_SSE

The platform independent header allowing to compile any C/C++ code containing ARM NEON intrinsic functions for x86 target systems using SIMD up to AVX2 intrinsic functions
Other
430 stars 149 forks source link

vtbx4_u8 invalid results #23

Closed mikaheli closed 5 years ago

mikaheli commented 5 years ago

vtbx4_u8() produces invalid results (values go to zero) for out-of-bound values that are above 127.

In my test I get result vector 31 23 0 251 0 0 0 0, where 4 last zeros are invalid. Real result vector should be 31 23 0 251 252 253 254 255.

Example code to reproduce:

#include <stdio.h>
#include <stdint.h>

#define NEON2SSE_DISABLE_PERFORMANCE_WARNING
#include "ARM_NEON_2_x86_SSE/NEON_2_SSE.h"

// gcc -Wall -Wno-unused-function -Wno-sequence-point -msse4 -o test test.c

void print_u8x8(const char *msg, uint8x8_t vec)
{
    printf("%s vector\t", msg);
    printf(" %3u %3u %3u %3u %3u %3u %3u %3u\n",
           vget_lane_u8(vec, 0),
           vget_lane_u8(vec, 1),
           vget_lane_u8(vec, 2),
           vget_lane_u8(vec, 3),
           vget_lane_u8(vec, 4),
           vget_lane_u8(vec, 5),
           vget_lane_u8(vec, 6),
           vget_lane_u8(vec, 7));
    printf("\n");
}

int main(void)
{   
    uint8_t lut[8*4] =
    {
        31, 30, 29, 28, 27, 26, 25, 24,
        23, 22, 21, 20, 19, 18, 17, 16,
        15, 14, 13, 12, 11, 10,  9,  8,
         7,  6,  5,  4,  3,  2,  1,  0
    };

    uint8_t values[8] =
    {
        0, 8, 31, 127, 128, 129, 130, 131
    };

    uint8_t values_when_out_of_bounds[8] =
    {
        248, 249, 250, 251, 252, 253, 254, 255
    };

    uint8x8_t a = vld1_u8(values_when_out_of_bounds);
    uint8x8x4_t b;
    uint8x8_t c = vld1_u8(values);
    uint8x8_t new_values;

    b.val[0] = vld1_u8(&lut[0]);
    b.val[1] = vld1_u8(&lut[8]);
    b.val[2] = vld1_u8(&lut[16]);
    b.val[3] = vld1_u8(&lut[24]);

    // seems like out-of-bounds values above 127 in c-vector are not replaced
    // correctly with a-vector values
    // 
    // invalid result after vtbx4_u8:
    //    31  23   0 251   0   0   0   0
    // 
    // result should be:
    //    31  23   0 251 252 253 254 255
    new_values = vtbx4_u8(a, b, c); 
    print_u8x8("new values", new_values);

    return 0;
}
Zvictoria commented 5 years ago

mikaheli, mega thanks for this bugreport! it is related not to vtbx4 but to all vtbx versions (and probably to some other functions types, to be investigated by me later). I've considered some solutions, to be pushed in repo ASAP.

mikaheli commented 5 years ago

Thank you very much. Seems to work correctly now.