[Bug] RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc) when training VirConv-L model #72
During epoch 0/50 and train 3/3712 I got error like this:
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
How to handle it?
I will display the error in its entirety
Traceback (most recent call last): | 3/3712 [00:21<5:11:37, 5.04s/it, total_it=3]
File "/home/zhw/project/virconv/tools/train.py", line 201, in <module>
main()
File "/home/zhw/project/virconv/tools/train.py", line 152, in main
train_model(
File "/home/zhw/project/virconv/tools/train_utils/train_utils.py", line 95, in train_model
accumulated_iter = train_one_epoch(
File "/home/zhw/project/virconv/tools/train_utils/train_utils.py", line 44, in train_one_epoch
loss, tb_dict, disp_dict = model_func(model, batch)
File "/home/zhw/project/virconv/pcdet/models/__init__.py", line 32, in model_func
ret_dict, tb_dict, disp_dict = model(batch_dict)
File "/home/zhw/conda/anaconda3/envs/virconv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/zhw/project/virconv/pcdet/models/detectors/voxel_rcnn.py", line 10, in forward
batch_dict = cur_module(batch_dict)
File "/home/zhw/conda/anaconda3/envs/virconv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/zhw/project/virconv/pcdet/models/backbones_3d/spconv_backbone.py", line 658, in forward
newx_conv1 = self.vir_conv1(newinput_sp_tensor, batch_size, calib, 1, self.x_trans_train, trans_param)
File "/home/zhw/conda/anaconda3/envs/virconv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/zhw/project/virconv/pcdet/models/backbones_3d/spconv_backbone.py", line 216, in forward
uv_coords, depth = index2uv(d3_feat2.indices, batch_size, calib, stride, x_trans_train, trans_param)
File "/home/zhw/project/virconv/pcdet/models/backbones_3d/spconv_backbone.py", line 72, in index2uv
pts_rect = calib[b_i].lidar_to_rect_cuda(cur_pts[:, 0:3])
File "/home/zhw/project/virconv/pcdet/utils/calibration_kitti.py", line 129, in lidar_to_rect_cuda
pts_rect = torch.matmul(pts_lidar_hom[:], torch.matmul(V2C, R0))
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
During epoch 0/50 and train 3/3712 I got error like this:
How to handle it? I will display the error in its entirety
Environment Info "CUDA": { "GPU": [ "NVIDIA GeForce GTX 1080 Ti", "NVIDIA GeForce GTX 1080 Ti" ], "available": true, "version": "11.1" }, "Packages": { "PyTorch_debug": false, "PyTorch_version": "1.8.1+cu111", "TTS": "0.6.1", "numpy": "1.19.5" }, "System": { "OS": "Linux", "architecture": [ "64bit", "ELF" ], "processor": "x86_64", "python": "3.9.19", "version": "#213-Ubuntu SMP Fri Aug 2 19:14:16 UTC 2024" }