Error running with rtx3080 graphics card

Xushuangyin commented 2 years ago

RuntimeError: cuDNN error:CUDNN_STATUS_MAPPING_ERROR Rtx3080 cuda:10.0 pytorch:1.0.0 cudnn:7.3.5 `Traceback (most recent call last): File "./tools/train.py", line 256, in main() File "./tools/train.py", line 154, in main pred_r, pred_t, pred_c, emb = estimator(img, points, choose, idx) File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, kwargs) File "/home/xsy/Object-RPE-master/DenseFusion/lib/network.py", line 96, in forward out_img = self.cnn(img) File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, *kwargs) File "/home/xsy/Object-RPE-master/DenseFusion/lib/network.py", line 36, in forward x = self.model(x) File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, kwargs) File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward return self.module(*inputs[0], kwargs[0]) File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, *kwargs) File "/home/xsy/Object-RPE-master/DenseFusion/lib/pspnet.py", line 65, in forward f, class_f = self.feats(x) File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, kwargs) File "/home/xsy/Object-RPE-master/DenseFusion/lib/extractors.py", line 115, in forward x = self.conv1(x) File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs) File "/home/xsy/anaconda3/envs/pytorch1.0/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 320, in forward
self.padding, self.dilation, self.groups)

Is that the reason for my graphics card？ I tried to install cuda11 1,pytorch1. 8,cudnn8. 0.5, but it will appear RuntimeError: Legacy autograd function with non-static forward method is deprecated. Please use new-style autograd function with static forward method.. How to solve this problem?

cynthia-you commented 2 years ago

Hey, dude , as i know ,the rtx30s just only support cuda>=11.1 .

an99990 commented 2 years ago

hey @Xushuangyin the issue is with knn use of deprecated autograd like mentionned in the last error message. I used this pull request and it worked pull request : https://github.com/j96w/DenseFusion/pull/170

jc0725 commented 2 years ago

@Xushuangyin Were you able to solve this issue?

Xushuangyin commented 2 years ago

Hello, I didn't solve this problem in the end. I can't find a KNN_ pytorch suitable for rtx30s series.

------------------ 原始邮件 ------------------ 发件人: "j96w/DenseFusion" @.>; 发送时间: 2022年4月14日(星期四) 上午10:19 @.>; @.**@.>; 主题: Re: [j96w/DenseFusion] Error running with rtx3080 graphics card (Issue #205)

@Xushuangyin Were you able to solve this issue?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Xushuangyin commented 2 years ago

I would like to ask if I use the LINEMOD data set to train the attitude estimation model, and call the model in real environment to use the camera to pose the real time pose of the object in the dataset. Can we achieve the desired result? thank you！

------------------ 原始邮件 ------------------ 发件人: "j96w/DenseFusion" @.>; 发送时间: 2022年4月14日(星期四) 上午10:19 @.>; @.**@.>; 主题: Re: [j96w/DenseFusion] Error running with rtx3080 graphics card (Issue #205)

@Xushuangyin Were you able to solve this issue?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

jc0725 commented 2 years ago

@Xushuangyin I'm still in the middle of training but it seems like I'm able to train the LINEMOD dataset on the rtx30s series.

Steps:

git clone -b Pytorch-1.0 https://github.com/j96w/DenseFusion.git
modify files and follow the terminal code as shown in #170 (I'm using CUDA 11.3 and seems to be working just fine)

Hope it helps!

Xushuangyin commented 2 years ago

I'll try it later. It's strange that the dataset I made can be trained, and part of the linemod dataset can also be trained, but the whole dataset can't be trained. Thank you.

------------------ 原始邮件 ------------------ 发件人: "j96w/DenseFusion" @.>; 发送时间: 2022年4月14日(星期四) 中午1:56 @.>; @.**@.>; 主题: Re: [j96w/DenseFusion] Error running with rtx3080 graphics card (Issue #205)

@Xushuangyin I'm still in the middle of training but it seems like I'm able to train the LINEMOD dataset on the rtx30s series.

Steps:

git clone -b Pytorch-1.0 https://github.com/j96w/DenseFusion.git

modify files and follow the terminal code as shown in #170 (I'm using CUDA 11.3 and seems to be working just fine)

Hope it helps!

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

jc0725 commented 2 years ago

@Xushuangyin Hello. Would it be possible for you to upload your DenseFusion project to your GitHub repository? I would love to see how you have made your own dataset work. Thank you in advance.

Xushuangyin commented 2 years ago

This is a link to the method I used to make the datasets. https://github.com/F2Wang/ObjectDatasetTools @jc0725

an99990 commented 2 years ago

hi @Xushuangyin , can you details us a bit what you modified in the code in order to train on your custom dataset. Did you resize the images ? changed the num_points ? I noticed the loop doesnt load all objects, it skips object 7 ?

I am having shapes issues ..

ValueError: operands could not be broadcast together with shapes (540,960,4) (3,)

Xushuangyin commented 2 years ago

Do you use objectdatasettools to create datasets? I seem to have encountered this problem before. I'm sorry I forgot how to solve it. You can try to modify the shape of the image. It should be that your image is 4 channels. You need to change it to 3 channels before matrix multiplication.

------------------ 原始邮件 ------------------ 发件人: "j96w/DenseFusion" @.>; 发送时间: 2022年4月15日(星期五) 晚上9:52 @.>; @.**@.>; 主题: Re: [j96w/DenseFusion] Error running with rtx3080 graphics card (Issue #205)

hi @Xushuangyin , can you details us a bit what you modified in the code in order to train on your custom dataset. Did you resize the images ? changed the num_points ? I noticed the loop doesnt load all objects, it skips object 7 ?

I am having shapes issues ..

ValueError: operands could not be broadcast together with shapes (540,960,4) (3,)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Xushuangyin commented 2 years ago

I modified the file, but I still reported an error when evaluating the linemod model. Can you provide me with the specific code you modified? What are the specific modifications? Thank you.

------------------ 原始邮件 ------------------ 发件人: "j96w/DenseFusion" @.>; 发送时间: 2022年4月14日(星期四) 中午1:56 @.>; @.**@.>; 主题: Re: [j96w/DenseFusion] Error running with rtx3080 graphics card (Issue #205)

@Xushuangyin I'm still in the middle of training but it seems like I'm able to train the LINEMOD dataset on the rtx30s series.

Steps:

git clone -b Pytorch-1.0 https://github.com/j96w/DenseFusion.git

modify files and follow the terminal code as shown in #170 (I'm using CUDA 11.3 and seems to be working just fine)

Hope it helps!

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

an99990 commented 2 years ago

hey i resized the image like you said and i didnt get the valueErro anymore. @jc0725 do you know what the values of num_points, num_pt_mesh_large and num_pt_mesh_small ?

I have models of my objects and some of them have less than 100 vetrices. Is num_pt_mesh_small the minimum number of vertices ?

I currently have a shape issue nd i think its related with the num_points :

Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 2-dimensional input of size [1, 1] instead

Xushuangyin commented 2 years ago

When training your own datasets, I didn't modify the parameters you said, which is consistent with the linemod datasets.

------------------ 原始邮件 ------------------ 发件人: "j96w/DenseFusion" @.>; 发送时间: 2022年4月18日(星期一) 上午6:13 @.>; @.**@.>; 主题: Re: [j96w/DenseFusion] Error running with rtx3080 graphics card (Issue #205)

hey i resized the image like you said and i didnt get the valueErro anymore. @Xushuangyin do you know what the values of num_points, num_pt_mesh_large and num_pt_mesh_small ?

I have models of my objects and some of them have less than 100 vetrices. Is num_pt_mesh_small the minimum number of vertices ?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Destinycjk commented 2 years ago

@Xushuangyin I'm still in the middle of training but it seems like I'm able to train the LINEMOD dataset on the rtx30s series.

Steps:

git clone -b Pytorch-1.0 https://github.com/j96w/DenseFusion.git

modify files and follow the terminal code as shown in Pytorch 1.6 and lib knn build with cuda 10.2 #170 (I'm using CUDA 11.3 and seems to be working just fine)

Hope it helps! Hello! @jc0725 I try pytorch=1.8.0, torchvision=0.9.0, cuda=11.1 on the rtx30s series, and I also follow the steps as https://github.com/j96w/DenseFusion/pull/170. However, when I try to train the LINEMOD dataset, I face the problem "ImportError: /home/chenkai/code/DenseFusion-Pytorch/lib/knn/knn_pytorch.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_10E". Have you faced the problem? Thank you very much!

wangqingyu985 commented 2 years ago

RTX 3090 also has the same problem.....

j96w / DenseFusion

Error running with rtx3080 graphics card #205