weigao95 / surfelwarp

SurfelWarp: Efficient Non-Volumetric Dynamic Reconstruction
https://sites.google.com/view/surfelwarp/home
BSD 3-Clause "New" or "Revised" License
276 stars 71 forks source link

incompatible for higher-level cuda & gpu #56

Closed Cryst4L9527 closed 3 years ago

Cryst4L9527 commented 3 years ago

when I try to run this code on nividia RTX3090 with cuda11.1,there is a new problem as below: 'shfl' without '.sync' is not supported on .target sm_70 and higher from PTX ISA version 6.4 It seems that the cuda code is not suitable for the novel gpu for some reason.Would you update your code to fit the new feature?I'll appreciate it very much!

BaldrLector commented 3 years ago

Hi, you need to change the CMakeLitsts.txt, refer to #23

Cryst4L9527 commented 3 years ago

I've changed the CMakeLists.txt,it seems that the problem is the cuda10+ doesn't support the shfl without sync anymore.

weigao95 commented 3 years ago

I remember that shfl will be deprecated. You need to change the device instrinsics. In the common/device_instrinsics.h file, you need to change shfl.up.b32 into shfl.sync.up.b32 according to this doc: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html. You need to add sync for all data types.

Cryst4L9527 commented 3 years ago

Thank you very much!In fact I change the arch to sm_61 when compile and it worked too,but maybe this operation will affect the speed.I'll try to change them.

At 2021-02-04 11:53:17, "Wei Gao" notifications@github.com wrote:

I remember that shfl will be deprecated. You need to change the device instrinsics. In the common/device_instrinsics.h file, you need to change shfl.up.b32 into shfl.sync.up.b32 according to this doc: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html. You need to add sync for all data types.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.