Running out of memory early

RuABraun commented 4 years ago

The docs say

kmcuda can sort 4M samples in 480 dimensions into 40000 clusters (if you have several days and 12 GB of GPU memory

I've got 600K x 250, and getting OOM while using a 1080ti (11GB memory).

arguments: 1 0x7ffca937bec4 0.010 0.10 0 600000 256 60000 3 0 0 2 0x7fa5e6aa6010 0x7fa5e300d010 0x7fa5e2dc3010 0x7ffca937beec
reassignments threshold: 6000
yinyang groups: 6000
cudaMalloc(&__ptr, __size)
/home/seni/git/kmcuda/src/kmcuda.cc:456 -> out of memory
failed to allocate 3600600000 bytes for device_bounds_yy

Do I have to disable yinyang?

vmarkovtsev commented 4 years ago

Please activate the maximum log verbosity, sum the reported allocation sizes together and check whether they exceed the GPU memory. Then the answer should be obvious :)

RuABraun commented 4 years ago

This is the entire output:

seni@seni-MS-7A32:/work/fun/subword-repr$ CUDA_VISIBLE_DEVICES="0" py3 km.py data/repr_nums
2020-01-29 20:19:12.182 | INFO     | __main__:kmeans:11 - Starting, data shape is (600000, 256).
arguments: 1 0x7ffe76a2e014 0.001 0.10 0 600000 256 60000 3 0 0 2 0x7f197cfa2010 0x7f1979509010 0x7f19792bf010 0x7ffe76a2e03c
reassignments threshold: 600
yinyang groups: 6000
cudaMalloc(&__ptr, __size)
/home/seni/git/kmcuda/src/kmcuda.cc:456 -> out of memory
failed to allocate 3600600000 bytes for device_bounds_yy
Traceback (most recent call last):
  File "km.py", line 63, in <module>
    plac.call(main)
  File "/usr/local/lib/python3.6/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/usr/local/lib/python3.6/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "km.py", line 43, in main
    kmeans(data, data.shape[0] // 10, iter, samples_in_batch)
  File "km.py", line 14, in kmeans
    verbosity = 2, yinyang_t = 0.1, seed = 3, tolerance=0.001, device=0, average_distance = True)
MemoryError: Failed to allocate memory on GPU

3600600000 bytes is 3.6G, as mentioned earlier the GPU I'm using has >11GB free.

vmarkovtsev commented 4 years ago

You should increase the log level.

RuABraun commented 4 years ago

I'm using 2, is there a higher one? Or should I recompile with a debug flag or something like that?

vmarkovtsev commented 4 years ago

Try 3. Nope, no need to recompile...

RuABraun commented 4 years ago

Here it is with 3:

seni@seni-MS-7A32:/work/fun/subword-repr$ CUDA_VISIBLE_DEVICES="0" py3 km.py data/repr_nums
2020-01-29 21:02:16.837 | INFO     | __main__:kmeans:11 - Starting, data shape is (600000, 256).
arguments: 1 0x7ffc92e5f0a4 0.001 0.10 0 600000 256 60000 3 0 0 3 0x7f62c76f9010 0x7f62c3c60010 0x7f62c3a16010 0x7ffc92e5f0cc
reassignments threshold: 600
yinyang groups: 6000
[0] *dest: 0x7f6268000000 - 0x7f628c9f0000 (614400000)
[0] device_centroids: 0x7f6296000000 - 0x7f6299a98000 (61440000)
[0] device_assignments: 0x7f628ca00000 - 0x7f628cc49f00 (2400000)
[0] device_assignments_prev: 0x7f628ce00000 - 0x7f628d049f00 (2400000)
[0] device_ccounts: 0x7f628d200000 - 0x7f628d23a980 (240000)
[0] device_assignments_yy: 0x7f628d23aa00 - 0x7f628d275380 (240000)
cudaMalloc(&__ptr, __size)
/home/seni/git/kmcuda/src/kmcuda.cc:456 -> out of memory
failed to allocate 3600600000 bytes for device_bounds_yy
Traceback (most recent call last):
  File "km.py", line 64, in <module>
    plac.call(main)
  File "/usr/local/lib/python3.6/site-packages/plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "/usr/local/lib/python3.6/site-packages/plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "km.py", line 44, in main
    kmeans(data, data.shape[0] // 10, iter, samples_in_batch)
  File "km.py", line 14, in kmeans
    verbosity = 3, yinyang_t = 0.1, seed = 3, tolerance=0.001, device=0, average_distance = True)
MemoryError: Failed to allocate memory on GPU

Seems like it should work no?

vmarkovtsev commented 4 years ago

It has allocated 614400000*2+2400000*4 bytes, that is, 1.2GB. So yeah, it should allocate another 3.6GB.

At this point I went to read my source code: https://github.com/src-d/kmcuda/blob/master/src/private.h#L135

It appears that there is a bug: it prints the error with size while it should really print __size. So that number is not multiplied by the array element size - 4. Thus, in reality, it tries to allocate 3600600000 * 4 = 14.4GB.

Problem solved: you should disable yinyang, and I should fix the error message :)

RuABraun commented 4 years ago

Thank you!

src-d / kmcuda

Running out of memory early #99