src-d / kmcuda

Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA
Other
797 stars 146 forks source link

Allow compute capabilities newer than the compiled version. #97

Closed berendo closed 4 years ago

berendo commented 4 years ago

Given that the specified compute capability is passed to nvcc via -arch sm_${CUDA_ARCH} in the CMakeLists.txt during the build, the compiled CUDA kernels have real binaries for that specific architecture and PTX binaries for the same compute architecture, allowing for JIT dynamic compilation on newer architectures (see https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#options-for-steering-gpu-code-generation). Therefore the compute capability check should only exclude versions that are older than specified compute capability rather than allowing only exact matches.

vmarkovtsev commented 4 years ago

Good to know this! Thanks for the contribution.

berendo commented 4 years ago

No problem. Thank you for developing this project!