kokkos / simd-math

Library for length agnostic SIMD intrinsic support and the corresponding math operations
Other
20 stars 10 forks source link

HOST_DEVICE issues in cuda-warp simd #16

Closed bjoo closed 4 years ago

bjoo commented 4 years ago

Ugh, I may have let a bug slip through when I added the hip-wavefront backend. I may have (in a bughunt) set SIMD_HOST_DEVICE to SIMD_DEVICE for the simd() constructors and opreator=() in cuda_warp.hpp

It is now (around line 168 in cuda_warp.hpp

SIMD_CUDA_ALWAYS_INLINE SIMD_DEVICE simd(T value)
    :m_value(value)
  {}
  SIMD_CUDA_ALWAYS_INLINE SIMD_DEVICE
  simd(storage_type const& value) {
    copy_from(value.data(), element_aligned_tag());
  }
  SIMD_CUDA_ALWAYS_INLINE SIMD_DEVICE
  simd& operator=(storage_type const& value) {
    copy_from(value.data(), element_aligned_tag());
    return *this;
  }

These should be. SIMD_HOST_DEVICE , I just had a hard time compiling the exercise_10 complaining that

/home/users/coe0071/HIP-SIMD/simd-math-testing/test/cuda-tests/exercise_10_solution.cpp(33): error: calling a __device__ function("simd") from a __host__ __device__ function("operator()") is not allowed

1 error detected in the compilation of "/tmp/tmpxft_00002de7_00000000-4_exercise_10_solution.cpp4.ii".
make[2]: *** [test/CMakeFiles/tut_ex10.dir/cuda-tests/exercise_10_solution.cpp.o] Error 2
make[1]: *** [test/CMakeFiles/tut_ex10.dir/all] Error 2
make: *** [all] Error 2

My apologies for having messed this up. I will have a PR soon for permutes which fixes this, but it may be worth hotfixing.

bjoo commented 4 years ago

And also in simd_common.hpp (BTW on this note, why are we setting this in two places? cuda_warp.hpp includes simd_common.hpp)

crtrott commented 4 years ago

this was actually part of my PR #13

ibaned commented 4 years ago

Can we (@bjoo) confirm that #13 was sufficient and that both this issue and #17 can be closed?

bjoo commented 4 years ago

Having done a read of it, I think #13 will close #16 and obviate my #17. I thought I based my code on a master after #13 was merged tho… So maybe I was going through it and reverting it thinking ‘thiss stuff surely isn’t needed on the host’. So likely mea culpa there.

I am about to submit a pull that has permutes for AVX, AVX512, CUDA_WARP and HIP. I can ensure that these fixes are all in there.

Would that be a better way to go? @crtrott’s exercises are part of my testsuite.

Best, B

On May 4, 2020, at 12:49 PM, Dan Ibanez notifications@github.com wrote:

Can we (@bjoo) confirm that #13 was sufficient and that both this issue and #17 can be closed?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.


Dr Balint Joo High Performance Computational Scientist Jefferson Lab 12000 Jefferson Ave Suite 3, MS 12B2, Room F217 Newport News VA 23606, USA Tel: +1-757-269-5339 email: bjoo AT jlab.org

bjoo commented 4 years ago

I can confirm that #13

Commit ID: f417a3872eca867de2d05eb0208010f0ecb071b5

fixes the CUDA Compilation problems. I will rebase onto that now.

Best, B

On May 4, 2020, at 12:49 PM, Dan Ibanez notifications@github.com wrote:

Can we (@bjoo) confirm that #13 was sufficient and that both this issue and #17 can be closed?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.


Dr Balint Joo High Performance Computational Scientist Jefferson Lab 12000 Jefferson Ave Suite 3, MS 12B2, Room F217 Newport News VA 23606, USA Tel: +1-757-269-5339 email: bjoo AT jlab.org