jniemann66 / ReSampler

High quality command-line audio sample rate converter
GNU Lesser General Public License v2.1
160 stars 26 forks source link

Microsoft m128_f32 not supported in gcc #2

Closed jniemann66 closed 7 years ago

jniemann66 commented 7 years ago

When trying to enable SSE2 intrinsic code for GCC, get this:

In file included from ReSampler.cpp:28:0:
FIRFilter.h: In member function ‘FloatType FIRFilter<FloatType>::get()’:
FIRFilter.h:246:25: error: request for member ‘m128_f32’ in ‘accumulator’, which is of non-class type ‘__m128 {aka __vector(4) float}’
   output += accumulator.m128_f32[0] +
                         ^
FIRFilter.h:247:16: error: request for member ‘m128_f32’ in ‘accumulator’, which is of non-class type ‘__m128 {aka __vector(4) float}’
    accumulator.m128_f32[1] +
                ^
FIRFilter.h:248:16: error: request for member ‘m128_f32’ in ‘accumulator’, which is of non-class type ‘__m128 {aka __vector(4) float}’
    accumulator.m128_f32[2] +
                ^
FIRFilter.h:249:16: error: request for member ‘m128_f32’ in ‘accumulator’, which is of non-class type ‘__m128 {aka __vector(4) float}’
    accumulator.m128_f32[3];

May need to create a union. relevant link on SO ?

jniemann66 commented 7 years ago

incorporated some custom horizontal-sum code (from here):

#ifdef SSE_CUSTOM_HSUM
        // http://stackoverflow.com/questions/6996764/fastest-way-to-do-horizontal-float-vector-sum-on-x86

        __m128 a   = _mm_shuffle_ps(
            accumulator, 
            accumulator,                 // accumulator = [D     C     | B     A    ]
            _MM_SHUFFLE(2, 3, 0, 1));                  // [C     D     | A     B    ]
        __m128 b   = _mm_add_ps(accumulator, a);       // [D+C   C+D   | B+A   A+B  ]
        a          = _mm_movehl_ps(a, b);              // [C     D     | D+C   C+D  ]
        b          = _mm_add_ss(a, b);                 // [C     D     | D+C A+B+C+D]
        output    += _mm_cvtss_f32(b);                 // A+B+C+D
#else
        output += 
            accumulator.m128_f32[0] +
            accumulator.m128_f32[1] +
            accumulator.m128_f32[2] +
            accumulator.m128_f32[3];
#endif

requires SSE_CUSTOM_HSUM to be defined

jniemann66 commented 7 years ago

closing ...