jiangxiluning / FOTS.PyTorch

FOTS Pytorch Implementation
BSD 3-Clause "New" or "Revised" License
643 stars 194 forks source link

Exception[pytorch_lightning]: You are trying to `self.log()` but it is not managed by the `Trainer` control flow #101

Closed JANGSOONMYUN closed 2 years ago

JANGSOONMYUN commented 2 years ago

Hi, I've met a crash while running training model. There is a problem from pytorch_lightning, maybe. But I don't understand why 'self.log' cannot be used. Please help me to solve the problem.

I ran the project on the colab. The version of the pytorch_lightening is 1.5.

Traceback (most recent call last): File "train.py", line 99, in main(config, args.resume) File "train.py", line 75, in main trainer.fit(model=model, datamodule=data_module) File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 769, in fit self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 719, in _call_and_handle_interrupt return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, kwargs) File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch return function(*args, *kwargs) File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 809, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 1234, in _run results = self._run_stage() File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 1321, in _run_stage return self._run_train() File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 1351, in _run_train self.fit_loop.run() File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/base.py", line 204, in run self.advance(args, kwargs) File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/fit_loop.py", line 268, in advance self._outputs = self.epoch_loop.run(self._data_fetcher) File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/base.py", line 204, in run self.advance(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 246, in advance self.trainer._logger_connector.update_train_step_metrics() File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py", line 197, in update_train_step_metrics self._log_gpus_metrics() File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py", line 226, in _log_gpus_metrics key, mem, prog_bar=False, logger=True, on_step=True, on_epoch=False File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/core/lightning.py", line 386, in log "You are trying to self.log() but it is not managed by the Trainer control flow" pytorch_lightning.utilities.exceptions.MisconfigurationException: You are trying to self.log() but it is not managed by the Trainer control flow

JANGSOONMYUN commented 2 years ago

Uncommented "accelerator='ddp'" and "log_gpu_memory=config.trainer.log_gpu_memory" in Trainer in train.py as below. And it worked.

trainer = Trainer(
    logger=wandb_logger,
    callbacks=[checkpoint_callback],
    max_epochs=config.trainer.epochs,
    default_root_dir=root_dir,
    gpus=gpus,
    # accelerator='ddp',
    benchmark=True,
    sync_batchnorm=True,
    precision=config.precision,
    # log_gpu_memory=config.trainer.log_gpu_memory,
    log_every_n_steps=config.trainer.log_every_n_steps,
    overfit_batches=config.trainer.overfit_batches,
    weights_summary='full',
    terminate_on_nan=config.trainer.terminate_on_nan,
    fast_dev_run=config.trainer.fast_dev_run,
    check_val_every_n_epoch=config.trainer.check_val_every_n_epoch,
    resume_from_checkpoint=resume_ckpt)