iSEE-Laboratory / EconomicGrasp

Official implementation of the Economic 6-DoF Grasp Detection Framework (EconomicGrasp).
MIT License
25 stars 3 forks source link

The issue encountered when running train.py #3

Open zyx-cas opened 2 days ago

zyx-cas commented 2 days ago

Dear colleague,

I encountered the following issue when running train.py:

raceback (most recent call last):
  File "train.py", line 166, in <module>
    train(start_epoch)
  File "train.py", line 152, in train
    train_one_epoch()
  File "train.py", line 107, in train_one_epoch
    end_points = net(batch_data_label)
  File "/home/zyx/anaconda3/envs/eco/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zyx/EconomicGrasp/models/economicgrasp.py", line 49, in forward
    ME.utils.sparse_quantize(coordinates_batch, features_batch, return_index=True, return_inverse=True)
  File "/home/zyx/anaconda3/envs/eco/lib/python3.8/site-packages/MinkowskiEngine-0.5.4-py3.8-linux-x86_64.egg/MinkowskiEngine/utils/quantization.py", line 262, in sparse_quantize
    discrete_coordinates = _auto_floor(coordinates)
  File "/home/zyx/anaconda3/envs/eco/lib/python3.8/site-packages/MinkowskiEngine-0.5.4-py3.8-linux-x86_64.egg/MinkowskiEngine/utils/quantization.py", line 133, in _auto_floor
    return torch.floor(array)
RuntimeError: "floor_cuda" not implemented for 'Int'

I temporarily resolved this issue by adding coordinates_batch = coordinates_batch.float().

However, I then encountered the following problem:

Exception has occurred: RuntimeError
CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
  File "/home/zyx/EconomicGrasp/models/backbone.py", line 218, in forward
    out_b2p4 = self.block2(out)
  File "/home/zyx/EconomicGrasp/models/economicgrasp.py", line 62, in forward
    seed_features = self.backbone(mink_input).F
  File "/home/zyx/EconomicGrasp/train.py", line 119, in train_one_epoch
    end_points = net(batch_data_label)
  File "/home/zyx/EconomicGrasp/train.py", line 164, in train
    train_one_epoch()
  File "/home/zyx/EconomicGrasp/train.py", line 178, in <module>
    train(start_epoch)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

Could you please advise on how to resolve this?

Thank you!

DravenALG commented 14 hours ago

It seems that the problem is due to the unsuitable running environment.

In fact, the environment, especially the MinkowskiEngine, is hard to install correctly. I also failed many times when I firstly config this.

Here are some suggestions:

  1. The best solution is that use a 3090 or 4090 machine (same as mine) and install the environment following the README. (I also report the common_issue.md to record some issues I have met)
  2. If you are using different GPU, try to firstly install the PyTorch and cudatoolkit suitable for your environment (check the PyTorch website for suitable versions), and then install the MinkowskiEngine following the official website.