Follow the CLAM's WSI processing solution, and I started trainning process by using "python train.py --stage='train' --config='Camelyon/TransMIL.yaml' --gpus=0 --fold=0". Nothing went wrong.
But the problem is ,when I try to train with 3 GPUs, I change nothing but my command (my command is :"python train.py --stage='train' --config='Camelyon/TransMIL.yaml' --gpus=0,1,2 --fold=0" ) I met AttributeError: 'Lookahead' object has no attribute 'base_optimizer'.
The specific Error messege is as following:
Traceback (most recent call last):
File "train.py", line 91, in <module>
main(cfg)
File "train.py", line 70, in main
trainer.fit(model = model, datamodule = dm)
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 514, in fit
self.dispatch()
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 554, in dispatch
self.accelerator.start_training(self)
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 74, in start_training
self.training_type_plugin.start_training(trainer)
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 106, in start_training
mp.spawn(self.new_process, **self.mp_spawn_kwargs)
File "/opt/conda/envs/test/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/opt/conda/envs/test/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/opt/conda/envs/test/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/opt/conda/envs/test/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 159, in new_process
results = trainer.train_or_test_or_predict()
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 564, in train_or_test_or_predict
results = self.run_train()
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 645, in run_train
self.train_loop.run_training_epoch()
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 493, in run_training_epoch
batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 655, in run_training_batch
self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 434, in optimizer_step
using_lbfgs=is_lbfgs,
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/core/lightning.py", line 1384, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/core/optimizer.py", line 214, in step
self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/core/optimizer.py", line 134, in __optimizer_step
trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs)
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 279, in optimizer_step
self.precision_plugin.post_optimizer_step(optimizer, opt_idx)
File "/opt/conda/envs/test/lib/python3.7/site-packages/pytorch_lightning/plugins/precision/native_amp.py", line 88, in post_optimizer_step
self.scaler.step(optimizer)
File "/opt/conda/envs/test/lib/python3.7/site-packages/torch/cuda/amp/grad_scaler.py", line 333, in step
retval = optimizer.step(*args, **kwargs)
File "/opt/conda/envs/test/lib/python3.7/site-packages/torch/optim/optimizer.py", line 89, in wrapper
return func(*args, **kwargs)
File "/data/TransMIL/MyOptimizer/lookahead.py", line 47, in step
loss = self.base_optimizer.step(closure)
AttributeError: 'Lookahead' object has no attribute 'base_optimizer
Dataset: CamelYon16.
Follow the CLAM's WSI processing solution, and I started trainning process by using "python train.py --stage='train' --config='Camelyon/TransMIL.yaml' --gpus=0 --fold=0". Nothing went wrong. But the problem is ,when I try to train with 3 GPUs, I change nothing but my command (my command is :"python train.py --stage='train' --config='Camelyon/TransMIL.yaml' --gpus=0,1,2 --fold=0" ) I met AttributeError: 'Lookahead' object has no attribute 'base_optimizer'.
The specific Error messege is as following:
How can I solve it ?