octree-nn / octformer

OctFormer: Octree-based Transformers for 3D Point Clouds
MIT License
259 stars 18 forks source link

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)` #14

Closed z1xy2 closed 11 months ago

z1xy2 commented 11 months ago

Hi, when running the code it will report an error at the line out[start:end] = torch.mm(buffer.flatten(1, 2), weights.flatten(0, 1)), after checking it's a cuda problem. What version of nvcc or cuda toolkit were you using at the time and can you give me a reference, thanks!

Traceback (most recent call last):
  File "/home/zxy/lab/code/octformer/segmentation.py", line 197, in <module>
    SegSolver.main()
  File "/home/zxy/miniforge3/envs/oct/lib/python3.8/site-packages/thsolver/solver.py", line 433, in main
    cls.worker(0, FLAGS)
  File "/home/zxy/miniforge3/envs/oct/lib/python3.8/site-packages/thsolver/solver.py", line 422, in worker
    the_solver.run()
  File "/home/zxy/miniforge3/envs/oct/lib/python3.8/site-packages/thsolver/solver.py", line 397, in run
    eval('self.%s()' % self.FLAGS.SOLVER.run)
  File "<string>", line 1, in <module>
  File "/home/zxy/miniforge3/envs/oct/lib/python3.8/site-packages/thsolver/solver.py", line 320, in train
    self.train_epoch(epoch)
  File "/home/zxy/miniforge3/envs/oct/lib/python3.8/site-packages/thsolver/solver.py", line 156, in train_epoch
    output = self.train_step(batch)
  File "/home/zxy/lab/code/octformer/segmentation.py", line 94, in train_step
    logit, label = self.model_forward(batch)
  File "/home/zxy/lab/code/octformer/segmentation.py", line 72, in model_forward
    logit = self.model(data, octree, octree.depth, query_pts)
  File "/home/zxy/miniforge3/envs/oct/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zxy/lab/code/octformer/models/octformerseg.py", line 93, in forward
    features = self.backbone(data, octree, depth) #传入点的feature,八叉树,以及深度
  File "/home/zxy/miniforge3/envs/oct/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zxy/lab/code/octformer/models/octformer.py", line 386, in forward
    data = self.patch_embed(data, octree, depth)
  File "/home/zxy/miniforge3/envs/oct/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zxy/lab/code/octformer/models/octformer.py", line 338, in forward
    data = self.convs[i](data, octree, depth_i)
  File "/home/zxy/miniforge3/envs/oct/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zxy/miniforge3/envs/oct/lib/python3.8/site-packages/ocnn/modules/modules.py", line 72, in forward
    out = self.conv(data, octree, depth)
  File "/home/zxy/miniforge3/envs/oct/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zxy/miniforge3/envs/oct/lib/python3.8/site-packages/ocnn/nn/octree_conv.py", line 357, in forward
    out = octree_conv(
  File "/home/zxy/miniforge3/envs/oct/lib/python3.8/site-packages/ocnn/nn/octree_conv.py", line 225, in forward
    out = octree_conv.forward_gemm(out, data, weights)
  File "/home/zxy/miniforge3/envs/oct/lib/python3.8/site-packages/ocnn/nn/octree_conv.py", line 133, in forward_gemm
    out[start:end] = torch.mm(buffer.flatten(1, 2), weights.flatten(0, 1))
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
wang-ps commented 11 months ago

The code is tested on Ubuntu 20.04 with 4 Nvidia 3090 GPUs (24GB memory). The CUDA version is 11.3, the pytorch version is 1.12.1

z1xy2 commented 11 months ago

Thanks, I found the problem. It was my global cudatoolkit that was preventing torch.mm from running properly, I deleted the global environment variable and restarted the terminal with pycharm.sh open to make the environment variable change take effect. It's working fine now, thanks for the help!

wang-ps commented 11 months ago

Great!