Closed PanYuQi66666666 closed 6 months ago
Hi Pan,
can you share with me, if the problem persists, the command that generated this output? I think it is caused by a wrong specification of whether models in the ensembles are end-to-end or not.
Also, I see that there are different type of models in your checkpoints folder, for instance .../phase6_checkpoint.pth', -> this is end-to-end .../phase3_checkpoint.pth', -> this is end-to-end .../phase5_checkpoint.pth', -> this is not-end-to-end ..../phase2_checkpoint.pth'] -> this is not end-to-end (edit: before the edit, I got confused for a moment about phase2, and 3 because phases are named after the steps in the readme files, as a result, there are no phase1 or phase 4, my bad)
Try for example removing phase2 and phase5, or phase3 and phase 6
The code, as for now, currently supports the ensembling of homogeneous types of architectures, all of them should be either end-to-end (backbone + refining model) or not end-to-end (refining model only, w/o the backbone). In the first case, you should put the is_end_to_end argument to True, and False in the latter.
Let me know if it helps!
@jchenghu oh, i have settle the question by your help, thanks!!
I'm glad I helped, you're welcome!
As usual, feel free to open a new issue in case of other problems/questions.
Ensembling Evaluation Detected checkpoints: ['/mnt/workspace/ExpansionNet_v2/github_ignore_material/saves/first_base/phase6_checkpoint.pth', '/mnt/workspace/ExpansionNet_v2/github_ignore_material/saves/first_base/phase3_checkpoint.pth', '/mnt/workspace/ExpansionNet_v2/github_ignore_material/saves/first_base/phase5_checkpoint.pth', '/mnt/workspace/ExpansionNet_v2/github_ignore_material/saves/first_base/phase2_checkpoint.pth'] Traceback (most recent call last): File "/mnt/workspace/ExpansionNet_v2/test.py", line 455, in
spawn_train_processes(is_end_to_end=args.is_end_to_end,
File "/mnt/workspace/ExpansionNet_v2/test.py", line 379, in spawn_train_processes
mp.spawn(test,
File "/opt/conda/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 246, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
File "/opt/conda/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 202, in start_processes
while not context.join():
File "/opt/conda/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 163, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error: Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 74, in _wrap fn(i, *args) File "/mnt/workspace/ExpansionNet_v2/test.py", line 319, in test ddp_model = get_ensemble_model(model, checkpoints_list, rank=rank) File "/mnt/workspace/ExpansionNet_v2/test.py", line 235, in get_ensemble_model model.load_state_dict(checkpoint['model_state_dict']) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for End_ExpansionNet_v2: Missing key(s) in state_dict: "swin_transf.patch_embed.proj.weight", "swin_transf.patch_embed.proj.bias", "swin_transf.patch_embed.norm.weight.....................", " Please help me!! thanks!