src-d / kmcuda

Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA
Other
791 stars 145 forks source link

Out of memory #15

Closed olalonde closed 7 years ago

olalonde commented 7 years ago

I'm trying to run k-means on 7,732,159 samples in 128 dimensions into 10000 clusters on an AWS p2.xlarge instance which has 12 GB of GPU memory and am getting this error message:

arguments: 1 0x7ffcf2138984 0.010 0.10 0 7732159 128 10000 3 0 0 3 0x7f89a0088010 0x7f8bba61d010
 0x7f8bb889e010 (nil)
reassignments threshold: 77321
yinyang groups: 1000
[0] *dest: 0x12052e0000 - 0x12f1257e00 (3958865408)
[0] device_centroids: 0x12f1260000 - 0x12f1742000 (5120000)
[0] device_assignments: 0x12f1760000 - 0x12f34deefc (30928636)
[0] device_assignments_prev: 0x12f34e0000 - 0x12f525eefc (30928636)
[0] device_ccounts: 0x12f5260000 - 0x12f5269c40 (40000)
[0] device_assignments_yy: 0x12f5269e00 - 0x12f5273a40 (40000)
cudaMalloc(&__ptr, __size)
/home/ubuntu/code/kmcuda/src/kmcuda.cc:455 -> out of memory
failed to allocate 7739891159 bytes for device_bounds_yy
Traceback (most recent call last):
  File "./bin/bow.py", line 83, in <module>
    }[args.action]()
  File "./bin/bow.py", line 57, in train
    engine.train()
  File "/home/ubuntu/code/cv/cv/bow.py", line 82, in train
    centroids = self.kmeans.fit(features)
  File "/home/ubuntu/code/cv/cv/kmeans/kmcuda.py", line 33, in fit
    seed = self.seed,
MemoryError: Failed to allocate memory on GPU

failed to allocate 7739891159 bytes for device_bounds_yy is ~7.7 GB which should fit in the GPU's memory (12 GB). It seems also comparable to this in the README:

kmcuda can sort 4M samples in 480 dimensions into 40000 clusters (if you have several days and 12 GB of GPU memory)

Any idea why I'm out of memory? Also, since 7732159 * 128 * 4 bytes = 3.96 GB, is it normal that it allocates almost double that?

vmarkovtsev commented 7 years ago

7.7 gigs is just a single block, there are others, when they sum up they exceed 12gb. The solution is to set yinyang_t to 0.

olalonde commented 7 years ago

That seems to work.