Closed RuABraun closed 4 years ago
Please activate the maximum log verbosity, sum the reported allocation sizes together and check whether they exceed the GPU memory. Then the answer should be obvious :)
This is the entire output:
seni@seni-MS-7A32:/work/fun/subword-repr$ CUDA_VISIBLE_DEVICES="0" py3 km.py data/repr_nums
2020-01-29 20:19:12.182 | INFO | __main__:kmeans:11 - Starting, data shape is (600000, 256).
arguments: 1 0x7ffe76a2e014 0.001 0.10 0 600000 256 60000 3 0 0 2 0x7f197cfa2010 0x7f1979509010 0x7f19792bf010 0x7ffe76a2e03c
reassignments threshold: 600
yinyang groups: 6000
cudaMalloc(&__ptr, __size)
/home/seni/git/kmcuda/src/kmcuda.cc:456 -> out of memory
failed to allocate 3600600000 bytes for device_bounds_yy
Traceback (most recent call last):
File "km.py", line 63, in <module>
plac.call(main)
File "/usr/local/lib/python3.6/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/usr/local/lib/python3.6/site-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "km.py", line 43, in main
kmeans(data, data.shape[0] // 10, iter, samples_in_batch)
File "km.py", line 14, in kmeans
verbosity = 2, yinyang_t = 0.1, seed = 3, tolerance=0.001, device=0, average_distance = True)
MemoryError: Failed to allocate memory on GPU
3600600000 bytes is 3.6G, as mentioned earlier the GPU I'm using has >11GB free.
You should increase the log level.
I'm using 2, is there a higher one? Or should I recompile with a debug flag or something like that?
Try 3. Nope, no need to recompile...
Here it is with 3:
seni@seni-MS-7A32:/work/fun/subword-repr$ CUDA_VISIBLE_DEVICES="0" py3 km.py data/repr_nums
2020-01-29 21:02:16.837 | INFO | __main__:kmeans:11 - Starting, data shape is (600000, 256).
arguments: 1 0x7ffc92e5f0a4 0.001 0.10 0 600000 256 60000 3 0 0 3 0x7f62c76f9010 0x7f62c3c60010 0x7f62c3a16010 0x7ffc92e5f0cc
reassignments threshold: 600
yinyang groups: 6000
[0] *dest: 0x7f6268000000 - 0x7f628c9f0000 (614400000)
[0] device_centroids: 0x7f6296000000 - 0x7f6299a98000 (61440000)
[0] device_assignments: 0x7f628ca00000 - 0x7f628cc49f00 (2400000)
[0] device_assignments_prev: 0x7f628ce00000 - 0x7f628d049f00 (2400000)
[0] device_ccounts: 0x7f628d200000 - 0x7f628d23a980 (240000)
[0] device_assignments_yy: 0x7f628d23aa00 - 0x7f628d275380 (240000)
cudaMalloc(&__ptr, __size)
/home/seni/git/kmcuda/src/kmcuda.cc:456 -> out of memory
failed to allocate 3600600000 bytes for device_bounds_yy
Traceback (most recent call last):
File "km.py", line 64, in <module>
plac.call(main)
File "/usr/local/lib/python3.6/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/usr/local/lib/python3.6/site-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "km.py", line 44, in main
kmeans(data, data.shape[0] // 10, iter, samples_in_batch)
File "km.py", line 14, in kmeans
verbosity = 3, yinyang_t = 0.1, seed = 3, tolerance=0.001, device=0, average_distance = True)
MemoryError: Failed to allocate memory on GPU
Seems like it should work no?
It has allocated 614400000*2+2400000*4
bytes, that is, 1.2GB. So yeah, it should allocate another 3.6GB.
At this point I went to read my source code: https://github.com/src-d/kmcuda/blob/master/src/private.h#L135
It appears that there is a bug: it prints the error with size
while it should really print __size
. So that number is not multiplied by the array element size - 4. Thus, in reality, it tries to allocate 3600600000 * 4 = 14.4GB.
Problem solved: you should disable yinyang, and I should fix the error message :)
Thank you!
The docs say
I've got 600K x 250, and getting OOM while using a 1080ti (11GB memory).
Do I have to disable yinyang?