xmos / lib_xcore_math

XMOS optimised arithmetic and vector processing library
Other
6 stars 14 forks source link

xs3_memcpy Function Crashes with int32_t Array of Length 4 and Produces Incorrect Results #170

Closed VergilWang15 closed 10 months ago

VergilWang15 commented 10 months ago

Hello,

I've encountered an issue with the xs3_memcpy function where its behavior is inconsistent with the standard C library's memcpy. I've written a simple function to compare the differences between the two.

When testing with an int32_t array of length 4, xs3_memcpy causes a crash with the following output:

xrun: Program received signal ET_LOAD_STORE, Memory access exception.
      [Switching to tile[1] core[0]]
      xs3_memcpy () at /home/user/projects/lib_rgb_dev/lib_xcore_math/lib_xcore_math/src/arch/xs3/misc/xs3_memcpy.S:34

      34            { sub tmp, tmp, 1               ; vstd a[0]                     }
      Current language:  auto; currently asm

However, when the array length is 8, xs3_memcpy does not crash but produces completely incorrect values:

expected[0]: 0x12345678, actually[0]: 0x27EB08B5
expected[1]: 0x12345679, actually[1]: 0x17E517F5
...

Interestingly, when the array size is increased to 10, the last two elements are correct, but the rest are still wrong:

expected[0]: 0x12345678, actually[0]: 0x12345678
expected[1]: 0x12345679, actually[1]: 0x12345679
expected[2]: 0x1234567A, actually[2]: 0x7740F000  // wrong here
expected[3]: 0x1234567B, actually[3]: 0x00260821
...

Is this issue due to incorrect usage on my part, or is there another underlying cause?

Thank you for your assistance.

VergilWang15 commented 10 months ago

And here is my test code

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "xmath/xmath.h"
void test_memcpy(size_t num_elements) {
    int32_t WORD_ALIGNED *src = (int32_t *)malloc(num_elements * sizeof(int32_t));
    int32_t WORD_ALIGNED *dst = (int32_t *)malloc(num_elements * sizeof(int32_t));
    int32_t WORD_ALIGNED *expected = (int32_t *)malloc(num_elements * sizeof(int32_t));

    for (size_t i = 0; i < num_elements; ++i) {
        src[i] = (int32_t)(0x12345678 + i);
    }

    xs3_memcpy(dst, src, num_elements * sizeof(int32_t));

    memcpy(expected, src, num_elements * sizeof(int32_t));

    for (size_t i = 0; i < num_elements; ++i) {
        printf("expected[%zu]: 0x%08X, actually[%zu]: 0x%08X\n", i, expected[i], i, dst[i]);
    }

    free(src);
    free(dst);
    free(expected);
}
mbanth commented 10 months ago

@VergilWang15, thank your for this report. We will investigate it and let you know what we find.

uvvpavel commented 10 months ago

Hi @VergilWang15, I can suggest a drop-in solution while we investigate this issue.

There's a similar API which only works for the int32_t arrays, called vect_s32_copy. It does have a constraint for the number of elements of your array to be a multiple of 8, if that constraint is honoured the API will work much faster than the memcpy for the long arrays.

uvvpavel commented 10 months ago

Hi @VergilWang15, The bug in xs3_memcpy has been fixed and merged to develop, it now behaves just as a memcpy (but faster). Thank you for reporting ✨