Closed huamo555 closed 6 months ago
Hi, I've not tested DDP training for GraspNet baseline and GSNet. This error seems to appear in MinkowskiEngine of the unofficial GSNet. But in my experience with other programs, it's okay to train MinkowskEngine in DDP mode.
I am trying to get this code to run on multiple GPUs, but am encountering errors.
Traceback (most recent call last): File "new_train.py", line 188, in
train(start_epoch)
File "new_train.py", line 180, in train
train_one_epoch()
File "new_train.py", line 146, in train_one_epoch
end_points = net(batch_data_label)
File "/data2/gaoyuming/anaconda3/envs/env_n/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, kwargs)
File "/data2/gaoyuming/anaconda3/envs/env_n/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 167, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/data2/gaoyuming/anaconda3/envs/env_n/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 177, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/data2/gaoyuming/anaconda3/envs/env_n/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/data2/gaoyuming/anaconda3/envs/env_n/lib/python3.7/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
AttributeError: Caught AttributeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/data2/gaoyuming/anaconda3/envs/env_n/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, *kwargs)
File "/data2/gaoyuming/anaconda3/envs/env_n/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(input, kwargs)
File "/data2/gaoyuming/.cache/graspness_implementation-main/graspnet.py", line 73, in forward
seed_features = self.backbone(mink_input).F # mink_input [BNs(C+3)--> BNs512] 输入到backbone模型中,并获取输出的特征数据seed_features
File "/data2/gaoyuming/anaconda3/envs/env_n/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, *kwargs)
File "/data2/gaoyuming/.cache/graspness_implementation-main/backbone_resunet14.py", line 94, in forward
out = self.conv0p1s1(x)
File "/data2/gaoyuming/anaconda3/envs/env_n/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(input, **kwargs)
File "/data2/gaoyuming/anaconda3/envs/env_n/lib/python3.7/site-packages/MinkowskiEngine/MinkowskiConvolution.py", line 302, in forward
assert input.D == self.dimension
File "/data2/gaoyuming/anaconda3/envs/env_n/lib/python3.7/site-packages/torch/nn/modules/module.py", line 948, in getattr
type(self).name, name))
AttributeError: 'MinkowskiConvolution' object has no attribute 'dimension'