ARM-NEON intrinics fails to build on GCC for arm64

walbourn commented 3 years ago

Trying to build DirectXMath 3.16 with arm64-linux, it fails with the GCC 9.3 compiler.

DirectXPackedVector.inl: In function ‘DirectX::PackedVector::HALF* DirectX::PackedVector::XMConvertFloatToHalfStream(DirectX::PackedVector::HALF*, size_t, const float*, size_t, size_t)’:

DirectXPackedVector.inl:647:35: error: cannot convert ‘float*’ to ‘uint16_t*’ {aka ‘short unsigned int*’}
  647 |                     vst1_lane_u16(reinterpret_cast<float*>(pHalf), vHalf, 0);
      |                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                   |
      |                                   float*

/usr/lib/gcc/aarch64-linux-gnu/9/include/arm_neon.h:27780:26: note:   initializing argument 1 of ‘void vst1_lane_u16(uint16_t*, uint16x4_t, int)’
27780 | vst1_lane_u16 (uint16_t *__a, uint16x4_t __b, const int __lane)
      |                ~~~~~~~~~~^~~
In file included from /home/walbourn/vcpkg/installed/arm64-

DirectXPackedVector.inl:649:35: error: cannot convert ‘float*’ to ‘uint16_t*’ {aka ‘short unsigned int*’}
  649 |                     vst1_lane_u16(reinterpret_cast<float*>(pHalf), vHalf, 1);
      |                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                   |
      |                                   float*

DirectXPackedVector.inl:651:35: error: cannot convert ‘float*’ to ‘uint16_t*’ {aka ‘short unsigned int*’}
  651 |                     vst1_lane_u16(reinterpret_cast<float*>(pHalf), vHalf, 2);
      |                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                   |
      |                                   float*

DirectXPackedVector.inl:653:35: error: cannot convert ‘float*’ to ‘uint16_t*’ {aka ‘short unsigned int*’}
  653 |                     vst1_lane_u16(reinterpret_cast<float*>(pHalf), vHalf, 3);
      |                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                   |
      |                                   float*

DirectXPackedVector.inl:704:31: error: cannot convert ‘float*’ to ‘uint16_t*’ {aka ‘short unsigned int*’}
  704 |                 vst1_lane_u16(reinterpret_cast<float*>(pHalf), vHalf, 0);
      |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                               |
      |                               float*

DirectXPackedVector.inl:706:31: error: cannot convert ‘float*’ to ‘uint16_t*’ {aka ‘short unsigned int*’}
  706 |                 vst1_lane_u16(reinterpret_cast<float*>(pHalf), vHalf, 1);
      |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                               |
      |                               float*

DirectXPackedVector.inl:708:31: error: cannot convert ‘float*’ to ‘uint16_t*’ {aka ‘short unsigned int*’}
  708 |                 vst1_lane_u16(reinterpret_cast<float*>(pHalf), vHalf, 2);
      |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                               |
      |                               float*

DirectXPackedVector.inl:710:31: error: cannot convert ‘float*’ to ‘uint16_t*’ {aka ‘short unsigned int*’}
  710 |                 vst1_lane_u16(reinterpret_cast<float*>(pHalf), vHalf, 3);
      |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                               |
      |                               float*

walbourn commented 3 years ago

Visual C++ and clang/LLVM don't really do type-checking for ARM-NEON intrinsics, and my previous tests were with older versions of GCC that lacked support for the half-precision optimizations.

walbourn commented 3 years ago

Fixed in this commit

microsoft / DirectXMath

ARM-NEON intrinics fails to build on GCC for arm64 #121