idsia-robotics / RandLA-Net-pytorch

Our PyTorch implementation of RandLA-Net (https://github.com/QingyongHu/RandLA-Net)
26 stars 5 forks source link

RuntimeError: "addmm_cuda" not implemented for 'Int' #2

Closed cavayangtao closed 2 years ago

cavayangtao commented 3 years ago

Hi @Gabry993, thank you for your work. The code runs smoothly on the data provided. However, when I try to train on my customized data which has been converted to the format required, I got the error below:

Train: 98%|█████████▊| 49/50 [00:16<00:00, 3.40it/s, t_loss=0.70368, t_acc=0.86475] Train: 98%|█████████▊| 49/50 [00:16<00:00, 3.40it/s, t_loss=0.35262, t_acc=0.94245] Train: 100%|██████████| 50/50 [00:16<00:00, 3.41it/s, t_loss=0.35262, t_acc=0.94245]Train: 100%|██████████| 50/50 [00:16<00:00, 3.05it/s, t_loss=0.35262, t_acc=0.94245]

Validation: 0%| | 0/10 [00:01<?, ?it/s] epoch: 0%| | 0/2 [00:17<?, ?it/s] Traceback (most recent call last): File "/media/tyang/DATA/Projects/2_dense/RandLA-Net-pytorch/ldn_train.py", line 26, in model_name="repo_example") File "/media/tyang/DATA/Projects/2_dense/RandLA-Net-pytorch/model/training.py", line 384, in train_randlanet_model hyperpars['num_layers'], hyperpars['num_classes']) File "/media/tyang/DATA/Projects/2_dense/RandLA-Net-pytorch/model/training.py", line 85, in train_model n_classes, scheduler) File "/media/tyang/DATA/Projects/2_dense/RandLA-Net-pytorch/model/training.py", line 234, in validation outputs = model(inputs) File "/home/tyang/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/media/tyang/DATA/Projects/2_dense/RandLA-Net-pytorch/model/model.py", line 41, in forward x = self.fc1(x) File "/home/tyang/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, **kwargs) File "/home/tyang/anaconda3/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 96, in forward return F.linear(input, self.weight, self.bias) File "/home/tyang/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1847, in linear return torch._C._nn.linear(input, weight, bias) RuntimeError: "addmm_cuda" not implemented for 'Int'

Process finished with exit code 1

I would like to know if you have ever had this problem, or what the possible reasons are? Thanks in advance.

Gabry993 commented 3 years ago

Hi @cavayangtao, sorry for getting back to you so late. Did you manage to fix the issue? Unluckily I've never seen it, so I can't be really helpful :-/ It looks like that some CUDA operation is not meant to be used with Int. However, since the error seems to come from the forward call during validation, while is working for the training, I can only suggest to check the validation data to be sure that all the data types are the same as for the training data. As a double check, you could try to run a validation step on the same data you use for training: if the error disappears then it's definitely something wrong in the validation data.

cavayangtao commented 2 years ago

Hi @Gabry993, many thanks for your kindly reply. Currently, I haven't solved the problem. As you suggested, I tried to use the same data for train_set_list and test_set_list in train.py but got the same error. I will leave comments once the issue is fixed.

cavayangtao commented 2 years ago

Hi @Gabry993, in the line 232 of ./model/training.py, "inputs = unpack_input(input_list, n_classes, device)" should be "inputs = unpack_input(input_list, n_layers, device)". It doesn't make a difference in your test, since you have 5 classes and 5 layers. Fixing this issue solved my problem.

cavayangtao commented 2 years ago

Hi @Gabry993, in the line 232 of ./model/training.py, "inputs = unpack_input(input_list, n_classes, device)" should be "inputs = unpack_input(input_list, n_layers, device)". It doesn't make a difference in your test, since you have 5 classes and 5 layers. Fixing this issue solved my problem.

Gabry993 commented 2 years ago

Thank you really much for fixing this!

gougou0304 commented 2 years ago

Hi @Gabry993, thank you for your work. The code runs smoothly on the data provided. However, when I try to train on my customized data which has been converted to the format required, I got the error below:

Train: 98%|█████████▊| 49/50 [00:16<00:00, 3.40it/s, t_loss=0.70368, t_acc=0.86475] Train: 98%|█████████▊| 49/50 [00:16<00:00, 3.40it/s, t_loss=0.35262, t_acc=0.94245] Train: 100%|██████████| 50/50 [00:16<00:00, 3.41it/s, t_loss=0.35262, t_acc=0.94245]Train: 100%|██████████| 50/50 [00:16<00:00, 3.05it/s, t_loss=0.35262, t_acc=0.94245]

Validation: 0%| | 0/10 [00:01<?, ?it/s] epoch: 0%| | 0/2 [00:17<?, ?it/s] Traceback (most recent call last): File "/media/tyang/DATA/Projects/2_dense/RandLA-Net-pytorch/ldn_train.py", line 26, in model_name="repo_example") File "/media/tyang/DATA/Projects/2_dense/RandLA-Net-pytorch/model/training.py", line 384, in train_randlanet_model hyperpars['num_layers'], hyperpars['num_classes']) File "/media/tyang/DATA/Projects/2_dense/RandLA-Net-pytorch/model/training.py", line 85, in train_model n_classes, scheduler) File "/media/tyang/DATA/Projects/2_dense/RandLA-Net-pytorch/model/training.py", line 234, in validation outputs = model(inputs) File "/home/tyang/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/media/tyang/DATA/Projects/2_dense/RandLA-Net-pytorch/model/model.py", line 41, in forward x = self.fc1(x) File "/home/tyang/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, **kwargs) File "/home/tyang/anaconda3/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 96, in forward return F.linear(input, self.weight, self.bias) File "/home/tyang/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1847, in linear return torch._C._nn.linear(input, weight, bias) RuntimeError: "addmm_cuda" not implemented for 'Int'

Process finished with exit code 1

I would like to know if you have ever had this problem, or what the possible reasons are? Thanks in advance.

Thank you for your questions and answers. I learned a lot from them. But I have one more question.Could you please tell me how to convert the.ply format point cloud into the format required by the code?