When I ran the docker, I first got the found no nvidia driver error as issue. After installing nvidia-container, the problem seemed solved.
Then I tried again the following command. Since I have 2 cards on the machine, only card 0 is assigned.
sudo docker run -it --rm --gpus='"device=0"' -v xxx:/input -v xxx:/output --entrypoint bash jukemir/representations_jukebox
And then,
python main.py --batch_size 8
After a few minutes (of initializing I guess), I got the following error:
Traceback (most recent call last):
File "main.py", line 177, in
representation = get_acts_from_file(input_path, hps, vqvae, top_prior, meanpool=True)
File "main.py", line 86, in get_acts_from_file
z = get_z(audio, vqvae)
File "main.py", line 27, in get_z
zs = vqvae.encode(torch.cuda.FloatTensor(audio[np.newaxis, :, np.newaxis]))
File "/code/jukebox/jukebox/vqvae/vqvae.py", line 141, in encode
zs_i = self._encode(x_i, start_level=start_level, end_level=end_level)
File "/code/jukebox/jukebox/vqvae/vqvae.py", line 132, in _encode
x_out = encoder(x_in)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, kwargs)
File "/code/jukebox/jukebox/vqvae/encdec.py", line 80, in forward
x = level_block(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, *kwargs)
File "/code/jukebox/jukebox/vqvae/encdec.py", line 26, in forward
return self.model(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(input, kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, *kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 202, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR
I googled it and added torch.backends.cudnn.enabled = False to main.py but a new problem occurred:
Traceback (most recent call last):
File "main.py", line 179, in
representation = get_acts_from_file(input_path, hps, vqvae, top_prior, meanpool=True)
File "main.py", line 88, in get_acts_from_file
z = get_z(audio, vqvae)
File "main.py", line 29, in get_z
zs = vqvae.encode(torch.cuda.FloatTensor(audio[np.newaxis, :, np.newaxis]))
File "/code/jukebox/jukebox/vqvae/vqvae.py", line 141, in encode
zs_i = self._encode(x_i, start_level=start_level, end_level=end_level)
File "/code/jukebox/jukebox/vqvae/vqvae.py", line 132, in _encode
x_out = encoder(x_in)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, kwargs)
File "/code/jukebox/jukebox/vqvae/encdec.py", line 80, in forward
x = level_block(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, *kwargs)
File "/code/jukebox/jukebox/vqvae/encdec.py", line 26, in forward
return self.model(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(input, kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, *kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 202, in forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
I solved the problem by copying the souce codes from docker to my local dir and run in my own conda env. Seems the preblem originated from environment issues.
When I ran the docker, I first got the found no nvidia driver error as issue. After installing nvidia-container, the problem seemed solved.
Then I tried again the following command. Since I have 2 cards on the machine, only card 0 is assigned.
sudo docker run -it --rm --gpus='"device=0"' -v xxx:/input -v xxx:/output --entrypoint bash jukemir/representations_jukebox
And then,python main.py --batch_size 8
After a few minutes (of initializing I guess), I got the following error: Traceback (most recent call last): File "main.py", line 177, in
representation = get_acts_from_file(input_path, hps, vqvae, top_prior, meanpool=True)
File "main.py", line 86, in get_acts_from_file
z = get_z(audio, vqvae)
File "main.py", line 27, in get_z
zs = vqvae.encode(torch.cuda.FloatTensor(audio[np.newaxis, :, np.newaxis]))
File "/code/jukebox/jukebox/vqvae/vqvae.py", line 141, in encode
zs_i = self._encode(x_i, start_level=start_level, end_level=end_level)
File "/code/jukebox/jukebox/vqvae/vqvae.py", line 132, in _encode
x_out = encoder(x_in)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, kwargs)
File "/code/jukebox/jukebox/vqvae/encdec.py", line 80, in forward
x = level_block(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, *kwargs)
File "/code/jukebox/jukebox/vqvae/encdec.py", line 26, in forward
return self.model(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(input, kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, *kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 202, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR
I googled it and added torch.backends.cudnn.enabled = False to main.py but a new problem occurred: Traceback (most recent call last): File "main.py", line 179, in
representation = get_acts_from_file(input_path, hps, vqvae, top_prior, meanpool=True)
File "main.py", line 88, in get_acts_from_file
z = get_z(audio, vqvae)
File "main.py", line 29, in get_z
zs = vqvae.encode(torch.cuda.FloatTensor(audio[np.newaxis, :, np.newaxis]))
File "/code/jukebox/jukebox/vqvae/vqvae.py", line 141, in encode
zs_i = self._encode(x_i, start_level=start_level, end_level=end_level)
File "/code/jukebox/jukebox/vqvae/vqvae.py", line 132, in _encode
x_out = encoder(x_in)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, kwargs)
File "/code/jukebox/jukebox/vqvae/encdec.py", line 80, in forward
x = level_block(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, *kwargs)
File "/code/jukebox/jukebox/vqvae/encdec.py", line 26, in forward
return self.model(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(input, kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, *kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 202, in forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
Did I miss anything?