Closed XiaobinWu1998 closed 2 months ago
Considering that the current AD models typically require only a few hours for single-GPU training, there is no strong demand for DDP. Therefore, we classify DDP as optional support. Since we have adapted the original repository's models to the ADer framework, some methods may only be partially effective due to certain operators. In the future, we aim to support all methods comprehensively.
there is an error when i run
python -m torch.distributed.launch --nproc_per_node=$nproc_per_node --nnodes=$nnodes --node_rank=$node_rank --master_port=$master_port --use_env run.py -c configs/benchmark/destseg/destseg_256_300e.py -m train
with multiple GPUs.Traceback (most recent call last):
File "run.py", line 31, in
main()
File "run.py", line 26, in main
trainer = get_trainer(cfg)
File "/workspace/mycode/04-anomaly-detection/ADer/trainer/init.py", line 13, in get_trainer
return TRAINER.get_module(cfg.trainer.name)(cfg)
File "/workspace/mycode/04-anomaly-detection/ADer/trainer/destseg_trainer.py", line 41, in init
self.optim.de_st = get_optim(cfg.optim.de_st.kwargs, self.net.destseg.student_net,
File "/workspace/mysoftware/miniconda3/envs/ader/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1695, in getattr
raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
AttributeError: 'DistributedDataParallel' object has no attribute 'destseg'
so, what should i do?