junzhezhang / shape-inversion

[CVPR 2021] Unsupervised 3D Shape Completion through GAN Inversion
MIT License
131 stars 21 forks source link

CUBLAS_STATUS_EXECUTION_FAILED while training the model #7

Closed nama1arpit closed 3 years ago

nama1arpit commented 3 years ago

I was trying to train the model with this code:

python trainer.py \
--dataset CRN \
--class_choice chair \
--inversion_mode completion \
--mask_type k_mask \
--save_inversion_path ./saved_results/CRN_chair \
--ckpt_load pretrained_models/chair.pt \
--dataset_path data_dir/CRN/

Then, I get the following error:

Traceback (most recent call last):
  File "trainer.py", line 300, in <module>
    trainer.run()
  File "trainer.py", line 90, in run
    self.train()
  File "trainer.py", line 133, in train
    self.model.select_z(select_y=False)
  File "/home/$USER/shape-inversion/shape_inversion.py", line 279, in select_z
    x = self.G(tree)
  File "/home/$USER/miniconda3/envs/shapeinversion/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/$USER/shape-inversion/model/treegan_network.py", line 63, in forward
    feat = self.gcn(tree)
  File "/home/$USER/miniconda3/envs/shapeinversion/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/$USER/miniconda3/envs/shapeinversion/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/$USER/miniconda3/envs/shapeinversion/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/$USER/shape-inversion/model/gcn.py", line 56, in forward
    root_node = self.W_root[inx](tree[inx])
  File "/home/$USER/miniconda3/envs/shapeinversion/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/$USER/miniconda3/envs/shapeinversion/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 87, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/$USER/miniconda3/envs/shapeinversion/lib/python3.7/site-packages/torch/nn/functional.py", line 1371, in linear
    output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

Environment: System: Ubuntu 20.04.3 Python: 3.7.11 PyTorch: 1.2.0 CUDA Version: 11.4 gcc version 9.3.0

Any help will be greatly appreciated! Thank you!

nama1arpit commented 3 years ago

I think I solved the issue myself, just putting it here for others who may run into the same problem. I think the issue was with the PyTorch version compatibility with CUDA. I tried running the model again with the latest version of PyTorch 1.9 and it worked fine.