当我运行‘bash base_train.sh’时遇到的问题

tangjiaxi98 commented 2 years ago

2021-12-19 07:57:32,080 maskrcnn_benchmark.utils.checkpoint INFO: Saving checkpoint to ./model_final.pth /home/hp/anaconda3/envs/dcnet/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/amp/_initialize.py:25: UserWarning: An input tensor was not cuda. warnings.warn("An input tensor was not cuda.") Traceback (most recent call last): File "../../tools/train_net.py", line 213, in main() File "../../tools/train_net.py", line 206, in main model = train(cfg, args.local_rank, args.distributed, phase, shot, split) File "../../tools/train_net.py", line 97, in train arguments File "/media/hp/new/DCNet/DCNet/maskrcnn_benchmark/engine/trainer.py", line 149, in do_train attentions = model(images, targets, meta_input, meta_label,average_shot=True) File "/home/hp/anaconda3/envs/dcnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, kwargs) File "/home/hp/anaconda3/envs/dcnet/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/amp/_initialize.py", line 197, in new_fwd applier(kwargs, input_caster)) File "/media/hp/new/DCNet/DCNet/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 107, in forward attentions = self.meta_extractor(meta_input,dr=self.dense_relation) File "/media/hp/new/DCNet/DCNet/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 83, in meta_extractor base_feat = self.backbone((meta_data,1))[2] File "/home/hp/anaconda3/envs/dcnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, kwargs) File "/home/hp/anaconda3/envs/dcnet/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/home/hp/anaconda3/envs/dcnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, *kwargs) File "/media/hp/new/DCNet/DCNet/maskrcnn_benchmark/modeling/backbone/resnet.py", line 148, in forward x = self.stem(x,meta=meta) File "/home/hp/anaconda3/envs/dcnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, kwargs) File "/media/hp/new/DCNet/DCNet/maskrcnn_benchmark/modeling/backbone/resnet.py", line 366, in forward x = self.conv2(x) File "/home/hp/anaconda3/envs/dcnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/media/hp/new/DCNet/DCNet/maskrcnn_benchmark/layers/misc.py", line 33, in forward return super(Conv2d, self).forward(x) File "/home/hp/anaconda3/envs/dcnet/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 338, in forward self.padding, self.dilation, self.groups) RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same Traceback (most recent call last): File "/home/hp/anaconda3/envs/dcnet/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/hp/anaconda3/envs/dcnet/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/hp/anaconda3/envs/dcnet/lib/python3.6/site-packages/torch/distributed/launch.py", line 235, in main() File "/home/hp/anaconda3/envs/dcnet/lib/python3.6/site-packages/torch/distributed/launch.py", line 231, in main cmd=process.args) subprocess.CalledProcessError: Command '['/home/hp/anaconda3/envs/dcnet/bin/python', '-u', '../../tools/train_net.py', '--local_rank=0', '--config-file', 'configs/base/e2e_voc_split3_base.yaml']' returned non-zero exit status 1. mv: 无法获取'inference/voc_2007_test_split3_base/result.txt' 的文件状态(stat): 没有那个文件或目录

请问存储在inference文件夹下的是一些什么文件？我的inference文件夹下是空的。还有，请问为什么储存完model_final.pth后报了如上的错误，该怎么解决。希望您的回复，谢谢。

EmberaThomas commented 2 years ago

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same

Same problem and after 800 epochs my loss is nan

Zhengfei-0311 commented 2 years ago

我也是同样的错误

hzhupku / DCNet

当我运行‘bash base_train.sh’时遇到的问题 #17