open-mmlab / mmengine

OpenMMLab Foundational Library for Training Deep Learning Models
https://mmengine.readthedocs.io/
Apache License 2.0
1.15k stars 340 forks source link

Visualizer always plotting metrics against a step value of 0 #1119

Open GeorgePearse opened 1 year ago

GeorgePearse commented 1 year ago

Prerequisite

Environment

mmdet= 3.0.0rc0 mcv=2.0

It's as if the if instance clause is being hit every time. I have implemented a custom Aim visualizer backend, but I can't see how that would be responsible, because it just feeds through the step value.

Now that I think about it, I actually hit the same with my Tensorboard view, but used the relative time or wall time view as a crude alternative.

https://github.com/open-mmlab/mmengine/blob/main/mmengine/hooks/logger_hook.py

    def after_val_epoch(self,
                        runner,
                        metrics: Optional[Dict[str, float]] = None) -> None:
        """All subclasses should override this method, if they need any
        operations after each validation epoch.
        Args:
            runner (Runner): The runner of the validation process.
            metrics (Dict[str, float], optional): Evaluation results of all
                metrics on validation dataset. The keys are the names of the
                metrics, and the values are corresponding results.
        """
        tag, log_str = runner.log_processor.get_log_after_epoch(
            runner, len(runner.val_dataloader), 'val')
        runner.logger.info(log_str)
        if self.log_metric_by_epoch:
            # Accessing the epoch attribute of the runner will trigger
            # the construction of the train_loop. Therefore, to avoid
            # triggering the construction of the train_loop during
            # validation, check before accessing the epoch.
            if (isinstance(runner._train_loop, dict)
                    or runner._train_loop is None):
                epoch = 0
            else:
                epoch = runner.epoch
            runner.visualizer.add_scalars(
                tag, step=epoch, file_path=self.json_log_path)
        else:
            if (isinstance(runner._train_loop, dict)
                    or runner._train_loop is None):
                iter = 0
            else:
                iter = runner.iter
            runner.visualizer.add_scalars(
                tag, step=iter, file_path=self.json_log_path)

Reproduces the problem - code sample

...

Reproduces the problem - command or script

...

Reproduces the problem - error message

No error message, just buggy logs.

Additional information

  1. What do you think might be the reason?

One of the clauses which sets iter to 0 must always be hit.

GeorgePearse commented 1 year ago

For the sake of my problem, I've just overwritten the after_val_epoch() method in LoggerHook

@HOOKS.register_module()
class AimLoggerHook(LoggerHook):

    def after_val_epoch(self,
                        runner,
                        metrics: Optional[Dict[str, float]] = None) -> None:
        """All subclasses should override this method, if they need any
        operations after each validation epoch.
        Args:
            runner (Runner): The runner of the validation process.
            metrics (Dict[str, float], optional): Evaluation results of all
                metrics on validation dataset. The keys are the names of the
                metrics, and the values are corresponding results.
        """
        tag, log_str = runner.log_processor.get_log_after_epoch(
            runner, len(runner.val_dataloader), 'val')

        runner.logger.info(log_str)
        runner.visualizer.add_scalars(
            tag, step=runner.iter, file_path=self.json_log_path)

But I know that's not supporting some parameters / workflows.