Closed JANGSOONMYUN closed 2 years ago
Uncommented "accelerator='ddp'" and "log_gpu_memory=config.trainer.log_gpu_memory" in Trainer in train.py as below. And it worked.
trainer = Trainer(
logger=wandb_logger,
callbacks=[checkpoint_callback],
max_epochs=config.trainer.epochs,
default_root_dir=root_dir,
gpus=gpus,
# accelerator='ddp',
benchmark=True,
sync_batchnorm=True,
precision=config.precision,
# log_gpu_memory=config.trainer.log_gpu_memory,
log_every_n_steps=config.trainer.log_every_n_steps,
overfit_batches=config.trainer.overfit_batches,
weights_summary='full',
terminate_on_nan=config.trainer.terminate_on_nan,
fast_dev_run=config.trainer.fast_dev_run,
check_val_every_n_epoch=config.trainer.check_val_every_n_epoch,
resume_from_checkpoint=resume_ckpt)
Hi, I've met a crash while running training model. There is a problem from pytorch_lightning, maybe. But I don't understand why 'self.log' cannot be used. Please help me to solve the problem.
I ran the project on the colab. The version of the pytorch_lightening is 1.5.
Traceback (most recent call last): File "train.py", line 99, in
main(config, args.resume)
File "train.py", line 75, in main
trainer.fit(model=model, datamodule=data_module)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 769, in fit
self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 719, in _call_and_handle_interrupt
return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, kwargs)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
return function(*args, *kwargs)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 809, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 1234, in _run
results = self._run_stage()
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 1321, in _run_stage
return self._run_train()
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 1351, in _run_train
self.fit_loop.run()
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(args, kwargs)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/fit_loop.py", line 268, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 246, in advance
self.trainer._logger_connector.update_train_step_metrics()
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py", line 197, in update_train_step_metrics
self._log_gpus_metrics()
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py", line 226, in _log_gpus_metrics
key, mem, prog_bar=False, logger=True, on_step=True, on_epoch=False
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/core/lightning.py", line 386, in log
"You are trying to
self.log()
but it is not managed by theTrainer
control flow" pytorch_lightning.utilities.exceptions.MisconfigurationException: You are trying toself.log()
but it is not managed by theTrainer
control flow