src-d / kmcuda

Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA
Other
791 stars 145 forks source link

RuntimeError: cudaMemcpy failed #2

Closed charlesgillespie closed 8 years ago

charlesgillespie commented 8 years ago

Hello,

I am randomly getting the error "RuntimeError: cudaMemcpy failed". It occurs intermittently on the same data set. I can get kmeans to finish on the data set but it only works maybe 1 out of 3 times. The data set has 450K samples with 16 features each.

I am running on a GTX 1080 with 8GB RAM. While viewing the process execute, it consumes at maximum about 7% of the GPUs memory. No other processes are using the GPU. Does not seem like the GPU is running out of memory.

running Lloyd until reassignments drop below 49180
iteration 1: 447094 reassignments
iteration 2: 445995 reassignments
iteration 3: 130666 reassignments
iteration 4: 69232 reassignments
iteration 5: 46072 reassignments
performing kmeans++...
step 20kmeans_init_centroids() failed for yinyang groups: invalid argument
Traceback (most recent call last):
  File "~/models/pattern_recognition/KMeans/batch.py", line 45, in <module>
    run_id = build.partition(clusters=1000)
  File "~/models/pattern_recognition/KMeans/KMeansConstructor.py", line 151, in partition
    centroids, labels = kmeans_cuda(data, clusters, kmpp=True, verbosity=1)
RuntimeError: cudaMemcpy failed
vmarkovtsev commented 8 years ago

Looks like a bug. I was able to reproduce it on random data.

vmarkovtsev commented 8 years ago

Does it work for you now? I am not sure if it is the same bug as you encountered but now it works for me.

charlesgillespie commented 8 years ago

This use case was running with 1000 clusters.

It seems to be working more reliably for me now. Has not failed after the update.

Many thanks!

vmarkovtsev commented 8 years ago

Great! Thanks for testing.