src-d / kmcuda

Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA
Other
784 stars 144 forks source link

cudaMemcpy failed #30

Closed GSanchis closed 6 years ago

GSanchis commented 6 years ago

Hi,

I'm trying to get the KMCuda library running in Python, but I am getting the following error:

root:INFO:07:04:43 Clustering users into 50 clusters...
reassignments threshold: 470
transposing the samples...
performing kmeans++...
done            
too few clusters for this yinyang_t => Lloyd
iteration 1: 47094 reassignments
iteration 2: 10551 reassignments
iteration 3: 6259 reassignments
iteration 4: 4474 reassignments
iteration 5: 3621 reassignments
iteration 6: 2952 reassignments
iteration 7: 2536 reassignments
iteration 8: 2156 reassignments
iteration 9: 1895 reassignments
iteration 10: 1757 reassignments
iteration 11: 1527 reassignments
iteration 12: 1370 reassignments
iteration 13: 1213 reassignments
iteration 14: 1111 reassignments
iteration 15: 1038 reassignments
iteration 16: 903 reassignments
iteration 17: 919 reassignments
iteration 18: 870 reassignments
iteration 19: 835 reassignments
iteration 20: 766 reassignments
iteration 21: 725 reassignments
iteration 22: 673 reassignments
iteration 23: 637 reassignments
iteration 24: 580 reassignments
iteration 25: 580 reassignments
iteration 26: 555 reassignments
iteration 27: 557 reassignments
iteration 28: 503 reassignments
iteration 29: 495 reassignments
iteration 30: 509 reassignments
iteration 31: 486 reassignments
iteration 32: 454 reassignments
/home/sourced/Projects/kmcuda/src/kmcuda.cc:515 -> invalid argument
  File "train.py", line 900, in do_cluster
    a, assignments = kmeans_cuda(data, clusters=n_clusters, device=2, verbosity=1, yinyang_t=0)
RuntimeError: cudaMemcpy failed

I'm running this on a machine with 4 GPUs, and I have been experimenting and I'm getting that error whenever I set device to something other than {1,3,7,15}, i.e. there seems to be some relation between this error and not using the first GPU. Any hints?

Thanks in advance!!

vmarkovtsev commented 6 years ago

Thanks for reporting! This error apparently happens when the clustering is over and the result is written to host memory. https://github.com/src-d/kmcuda/blob/develop/src/kmcuda.cc#L515 takes devs.back() as the array index while I really should take devs.size() - 1 instead. Can you please fix this on lines 510 and 513 and test?

GSanchis commented 6 years ago

Thanks for responding so quickly!

Yup, that did solve the problem! :)

Do you want me to commit the code, or will you do it?

Where do you want me to add cudaSetDevice()?

vmarkovtsev commented 6 years ago

I would be happy to accept a PR!

cudaSetDevice() should not be required here because it is guaranteed that the last device is activated at that time. However, we can go defensive and add it before line 510, no problem.

vmarkovtsev commented 6 years ago

Fixed