raise error when try to train the model

hello12345world commented 5 months ago

When I try to train the model with the command:

python3 train_mask.py --config-file configs/maskformer/cityscapes.yaml --num-gpus 2 OUTPUT_DIR ./output/ps

It raise the error shows that:

Ambiguity found for res5.0.conv1.norm.bias in checkpoint!It matches at least two keys in the model (roi_heads.res5.0.conv1.norm.bias and backbone.res5.0.conv1.norm.bias). Traceback (most recent call last): File "train_mask.py", line 312, in launch( File "/root/data1/data/code/detectron2/detectron2/engine/launch.py", line 69, in launch mp.start_processes( File "/root/data1/data/software/anaconda3/envs/PanopticDepth/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes while not context.join(): File "/root/data1/data/software/anaconda3/envs/PanopticDepth/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 1 terminated with the following error: Traceback (most recent call last): File "/root/data1/data/software/anaconda3/envs/PanopticDepth/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, args) File "/root/data1/data/code/detectron2/detectron2/engine/launch.py", line 123, in _distributed_worker main_func(args) File "/root/data1/data/code/detectron2/projects/DeepDPS/train_mask.py", line 305, in main trainer.resume_or_load(resume=args.resume) File "/root/data1/data/code/detectron2/detectron2/engine/defaults.py", line 412, in resume_or_load self.checkpointer.resume_or_load(self.cfg.MODEL.WEIGHTS, resume=resume) File "/root/data1/data/software/anaconda3/envs/PanopticDepth/lib/python3.8/site-packages/fvcore/common/checkpoint.py", line 227, in resume_or_load return self.load(path, checkpointables=[]) File "/root/data1/data/code/detectron2/detectron2/checkpoint/detection_checkpoint.py", line 62, in load ret = super().load(path, *args, **kwargs) File "/root/data1/data/software/anaconda3/envs/PanopticDepth/lib/python3.8/site-packages/fvcore/common/checkpoint.py", line 156, in load incompatible = self._load_model(checkpoint) File "/root/data1/data/code/detectron2/detectron2/checkpoint/detection_checkpoint.py", line 120, in _load_model checkpoint["model"] = align_and_update_state_dicts( File "/root/data1/data/code/detectron2/detectron2/checkpoint/c2_model_loading.py", line 287, in align_and_update_state_dicts raise ValueError("Cannot match one checkpoint key to multiple keys in the model.") ValueError: Cannot match one checkpoint key to multiple keys in the model.

What should I do to solve this error?

uowei commented 3 months ago

@hello12345world I'm new to this field. Are you still training this model? Would it be possible for you to share the complete code with me? It would be immensely helpful to me!

Thank you very much!

uowei commented 1 month ago

@hello12345world @jwh97nn

I would like to ask how I should evaluate pre-trained model weights. Currently, I have set eval_only to True, but the results show that all of my benchmarks are 0. I am only seeing responses for benchmarks like silog, abs_rel, log10, rms, sq_rel, log_rms, d1, d2, and d3. Additionally, only three data files are generated in the pred_dir, which suggests that the evaluation might not have been successful. Could you please advise if there are any additional steps or modifications I need to make?

Thank you in advance for your assistance.

jwh97nn / DeepDPS

raise error when try to train the model #3