madebyollin / acapellabot

Acapella Extraction with a ConvNet
http://madebyoll.in/posts/cnn_acapella_extraction/
205 stars 44 forks source link

Fitting step runs out of memory on GPU's #5

Open Tears5Fears opened 7 years ago

Tears5Fears commented 7 years ago

Running python acapellabot.py sample.mp3 --weights weights.h5 on the pretrained model works on CPU's, but crashes on GPU's due to memory overflows. I suspect it's something to do with the concatenation steps within the keras model.

Tested on a Tesla K80:


Error when tring to find the memory information on the GPU: an illegal memory access was encountered
Error freeing device pointer 0x1205ae0000 (an illegal memory access was encountered). Driver report 0 bytes free and 0 bytes total 
CudaNdarray_uninit: error freeing self->devdata. (self=0x7f5d4a24bcb0, self->devata=0x1205ae0000)
Error when tring to find the memory information on the GPU: an illegal memory access was encountered
Error freeing device pointer 0x1205560000 (an illegal memory access was encountered). Driver report 0 bytes free and 0 bytes total 
device_free: cudaFree() returned an error, but there is already an Python error set. This happen during the clean up when there is a first error and the CUDA driver is in a so bad state that it don't work anymore. We keep the previous error set to help debugging it.CudaNdarray_uninit: error freeing self->devdata. (self=0x7f5d4c54c7b0, self->devata=0x1205560000)
Error when trying to find the memory information on the GPU: an illegal memory access was encountered
Error allocating 863849472 bytes of device memory (an illegal memory access was encountered). Driver report 0 bytes free and 0 bytes total

Tested on a GTX 1070:

   File "pygpu/gpuarray.pyx", line 1501, in pygpu.gpuarray.pygpu_concatenate
  File "pygpu/gpuarray.pyx", line 427, in pygpu.gpuarray.array_concatenate
pygpu.gpuarray.GpuArrayException: b'cuMemAlloc: CUDA_ERROR_OUT_OF_MEMORY: out of memory'
Apply node that caused the error: GpuJoin(TensorConstant{3}, GpuReshape{4}.0, GpuElemwise{Composite{(i0 * ((i1 + i2) + Abs((i1 + i2))))}}[]<gpuarray>.0)
Toposort index: 418
Inputs types: [TensorType(int8, scalar), GpuArrayType<None>(float32, (False, False, False, False)), GpuArrayType<None>(float32, (False, False, False, False))]
Inputs shapes: [(), (1, 386, 8742, 128), (1, 386, 8742, 64)]
Inputs strides: [(), (1727698944, 4475904, 512, 4), (863849472, 2237952, 256, 4)]
Inputs values: [array(3, dtype=int8), 'not shown', 'not shown']
Outputs clients: [[InplaceGpuDimShuffle{0,3,1,2}(GpuJoin.0)]]
Tears5Fears commented 7 years ago

My other hypothesis is that your provided model is sufficiently large so that memory size is a bottleneck. But you mention that you've been able to fit your model on a GTX 1060 GPU, so I'm unsure whether that's the issue. Perhaps a simpler convolutional model would work better on my setup.