src-d / kmcuda

Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA
Other
806 stars 145 forks source link

compute capability mismatch for device 0: wanted 6.1, have 6.0 #68

Closed dfeddema closed 5 years ago

dfeddema commented 5 years ago

I am getting this error on rhel 7.5 with Python 3.6.5, CUDA 8.0 (V8.0.61), and gcc 4.9.2. It looks as though I may need CUDA V8.0.60 instead of V8.0.61?

[root@e35559eae255 kmcuda]# python pythontest.py reassignments threshold: 100 compute capability mismatch for device 0: wanted 6.1, have 6.0

you may want to build kmcuda with -DCUDA_ARCH=60 (refer to "Building" in README.md) compute capability mismatch for device 1: wanted 6.1, have 6.0 you may want to build kmcuda with -DCUDA_ARCH=60 (refer to "Building" in README.md) compute capability mismatch for device 2: wanted 6.1, have 6.0 you may want to build kmcuda with -DCUDA_ARCH=60 (refer to "Building" in README.md) Traceback (most recent call last): File "pythontest.py", line 11, in centroids, assignments = kmeans_cuda(arr, 4, verbosity=1, seed=3) ValueError: No such CUDA device exists

[root@e35559eae255 kmcuda]# cat pythontest.py import numpy from matplotlib import pyplot from libKMCUDA import kmeans_cuda

numpy.random.seed(0) arr = numpy.empty((10000, 2), dtype=numpy.float32) arr[:2500] = numpy.random.rand(2500, 2) + [0, 2] arr[2500:5000] = numpy.random.rand(2500, 2) - [0, 2] arr[5000:7500] = numpy.random.rand(2500, 2) + [2, 0] arr[7500:] = numpy.random.rand(2500, 2) - [2, 0] centroids, assignments = kmeans_cuda(arr, 4, verbosity=1, seed=3) print(centroids)

pyplot.scatter(arr[:, 0], arr[:, 1], c=assignments)

pyplot.scatter(centroids[:, 0], centroids[:, 1], c="white", s=150)

I have the following packages installed:

[root@e35559eae255 kmcuda]# pip list installed Package Version


absl-py 0.7.0
appdirs 1.4.3
astor 0.7.1
atomicwrites 1.3.0
attrs 18.2.0
cycler 0.10.0
decorator 4.3.2
gast 0.2.2
grpcio 1.18.0
h5py 2.9.0
javapackages 4.3.2
Keras-Applications 1.0.7
Keras-Preprocessing 1.0.9
kiwisolver 1.0.1
libKMCUDA 6.2.2
Mako 1.0.7
Markdown 3.0.1
MarkupSafe 1.1.0
matplotlib 3.0.2
more-itertools 6.0.0
numpy 1.16.1
pip 19.0.3
pluggy 0.8.1
protobuf 3.6.1
py 1.8.0
pycuda 2018.1.1 pyparsing 2.3.1
pytest 4.3.0
python-dateutil 2.8.0
pytools 2019.1
PyXB 1.2.4
scikit-learn 0.20.2
scipy 1.2.1
setuptools 40.8.0
six 1.12.0
tensorboard 1.12.2
tensorflow 1.12.0
termcolor 1.1.0
Werkzeug 0.14.1
wheel 0.33.1

I have tried the following Make command to build but have not resolved the problem. Suggestions? cmake -DCMAKE_BUILD_TYPE=Release -DCUDA_ARCH=60 . && make and installed with pip install git+https://github.com/src-d/kmcuda.git#subdirectory=src

vmarkovtsev commented 5 years ago

Hi @dfeddema CUDA version does not matter in this case. pip install builds the package and has nothing to do with the previous cmake command. There are two ways to build: cmake and package. I see that you need Python, so

CUDA_ARCH=60 pip install git+https://github.com/src-d/kmcuda.git#subdirectory=src
vmarkovtsev commented 5 years ago

Alternatively

CUDA_ARCH=60 pip install libKMCUDA
dfeddema commented 5 years ago

Hi @vmarkovtsev. Your suggestions worked! Thank you for making my afternoon!

I see in CMakeList.tst that CUDA_ARCH specifies the architecture (which gpu card) that nvcc will generate code for.

This link helped explain the Nvidia SM flags: https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/

I'm running on a Tesla P100 so I need -arch SM60 ... not SM61 as I had originally because that is the default.

In CMakeLists.txt I see the default setting for CUDA_ARCH: if (NOT DEFINED CUDA_ARCH) set(CUDA_ARCH "61") endif()

Also in CMakeLists.txt I see the nvcc sm flag being set: set(CUDA_NVCC_FLAGS "${CUDA_NVCCFLAGS} -arch sm${CUDA_ARCH} -Xptxas=-v -D_MWAITXINTRIN_H_INCLUDED -D_FORCE_INLINES")

Here's the test that ran correctly after your fix.

[root@3cb67f3e222b kmcuda]# python pythontest.py reassignments threshold: 100 transposing the samples... performing kmeans++... done
too few clusters for this yinyang_t => Lloyd iteration 1: 10000 reassignments iteration 2: 0 reassignments [[ 0.49675268 -1.504859 ] [ 0.4968266 2.497115 ] [ 2.4868565 0.49439764] [-1.5026922 0.5023965 ]]

vmarkovtsev commented 5 years ago

Happy that it worked! Sorry for not providing binary packages for all possible configurations, there are 5 active CUDA versions x 5 widespread device archs = 25 variants. One day we will.