New functions for float vector operations

lucianomartin commented 2 years ago

We would like to add some functions to lib_xcore_math. They have been implemented in lib_dsp.

The functions are all the ones present in the file https://github.com/xmos/lib_dsp/blob/feature/math-xs3/lib_dsp/api/dsp_vector_xs3.h

The functions are written in assembly and the definitions can be found in the file https://github.com/xmos/lib_dsp/blob/feature/math-xs3/lib_dsp/src/dsp_float_vect.S

astewart-xmos commented 1 year ago

Unfortunately no description of the behavior of these functions is available. I've looked at the implementations for each, and here's what they seem intended to do:

FFVectMac_xs3() - Complex element-wise multiply-accumulate (a += b * c)
FFVectMacC_xs3() - Complex element-wise conjugate multiply-accumulate (a += conjugate(b) * c)
FFVectMpy_xs3() - Complex element-wise multiply (a = b * c)
FFVectMpy_xs3() - Complex element-wise conjugate multiply (a = conjugate(b) * c)
FFVectAcc_xs3() - Complex element-wise accumulate (a += b)

NOTE: All but FFVectAcc_xs3() appear to possibly be implemented with an unobvious quirk. Specifically, they appear to do the correct thing for all except the first complex element (i.e. the real and imaginary parts at indices 0 and 1). For example, in FFVectMpy_xs3(), most elements are processed as

.Mpyloop:
    ldd r4, r5, r1[r0]
    ldd r6, r7, r2[r0]
    fmul r8, r4, r7
    fmacc r8, r8, r5, r6
    add r4, r4, r11
    fmul r9,r4, r6
    fmacc r9, r9, r5, r7
    std r8, r9, r3[r0]
    { bt  r0, .Mpyloop         ; sub r0, r0, 1 }

where a full complex multiply is done for each element. i.e.

a[k].re = b[k].re * c[k].re - b[k].im * c[k].im
a[k].im = b[k].re * c[k].im + b[k].im * c[k].re

Then the final element is processed as

ldd r4, r5, r1[r0]
ldd r6, r7, r2[r0]
fmul r8, r4, r6
fmul r9, r5, r7
std r8, r9, r3[r0]

which gives

a[0].re = b[0].re * c[0].re
a[0].im = b[0].im * c[0].im

This isn't the correct result for ordinary complex multiplication. However, I'm guessing this isn't a mistake. If we assume that we're not dealing with arbitrary complex signals, but only with the complex spectra of real signals, and further assume that the Nyquist rate component is packed into the imaginary part of the complex element at index 0, then these functions seem to do what they are supposed to.

I will move these into lib_xcore_math (the names and signatures will change to reflect the conventions used elsewhere in the library).

lucianomartin commented 1 year ago

Many many thanks @astewart-xmos for the prompt analysis. Those functions have been used for some time and the implementation should be reliable. Thanks for moving them into lib_xcore_math, would it be possible to have a new release of lIb_xcore_math when all the necessary work and checks are completed?

astewart-xmos commented 1 year ago

These changes should make it into a development release this week (tag will most likely be v2.1.0-dev). I'd prefer to hold off on a main release until we're confident that these changes are working as expected

astewart-xmos commented 1 year ago

These functions have been added and are included in the release with tag v2.1.0.

Note The release also includes implementations for "packed" versions of the complex float multiplies, which assume the real part of the Nyquist component is packed into the imaginary part of the DC component.

These more closely match the original functions, but do not follow the convention used by all other complex vector functions, so they are not a part of the 'official' API. In this case that just means that they don't have prototypes that are included with an #include "xmath/xmath.h". The function signatures are identical to the corresponding non-packed functions:

vect_complex_f32_mul() --> vect_packed_complex_f32_mul() vect_complex_f32_conj_mul() --> vect_packed_complex_f32_conj_mul() vect_complex_f32_macc() --> vect_packed_complex_f32_macc() vect_complex_f32_conj_macc() --> vect_packed_complex_f32_conj_macc()

xmos / lib_xcore_math

New functions for float vector operations #111