how to use logger - Githubissues

[x] I have marked all applicable categories:
- [ ] exception-raising bug
- [ ] RL algorithm bug
- [ ] system worker bug
- [ ] system utils bug
- [ ] code design/refactor
- [x] documentation request
- [ ] new feature request
[x] I have visited the readme and doc
[x] I have searched through the issue tracker and pr tracker

[x] I have mentioned version numbers, operating system and environment, where applicable:

import ding, torch, sys
print(ding.__version__, torch.__version__, sys.version, sys.platform)
v0.4.9 2.0.1 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0] linux

I am running the code with ding.bonus.ppof, I do not want to use wandb_online_logger but just a tensorboard, so I replace the wandb_online_logger with online_logger(train_show_freq=1000) like following

    with task.start(ctx=OnlineRLContext()):
        task.use(interaction_evaluator_ttorch(self.seed, self.policy, evaluator_env))
        task.use(PPOFStepCollector(self.seed, self.policy, collector_env, self.cfg.n_sample))
        task.use(ppof_adv_estimator(self.policy))
        task.use(multistep_trainer(self.policy, log_freq=n_iter_log_show))
        task.use(CkptSaver(self.policy, save_dir=self.exp_name, train_freq=n_iter_save_ckpt))
        task.use(online_logger(train_show_freq=1000)
            # wandb_online_logger(
            #     metric_list=self.policy.monitor_vars(),
            #     model=self.policy._model,
            #     anonymous=True,
            #     project_name=self.exp_name
            # )
        )
        task.use(termination_checker(max_env_step=step))
        task.run()

Then I got such error

  File "/opt/conda/lib/python3.10/site-packages/ding/framework/middleware/functional/logger.py", line 68, in _logger
    writer.add_scalar('basic/eval_episode_return_mean', ctx.eval_value, ctx.env_step)
AttributeError: 'NoneType' object has no attribute 'add_scalar'

It seems that I am using online_logger in the wrong way. Any example on how to use the online_logger from ding.framework.middleware? BTW, online_logger correspond to online RL algorithm?

I found the reason. Before using online_logger, I have to call ding_init(cfg), but this function only has one function for now, which is DistributedWriter.get_instance(cfg.exp_name), this is a little bit too complex, because when I want to use logger, I have to know that I have to call DistributedWriter.get_instance(cfg.exp_name) beforehand, which is not written anywhere in the document. Why not just adding another argument exp_name in the function of online_logger such that

def online_logger(record_train_iter: bool = False, train_show_freq: int = 100, exp_name: str = None) -> Callable:
    """
    Create an online logger for recording training and evaluation metrics.

    Arguments:
        - record_train_iter (bool): Whether to record training iteration. Default is False.
        - train_show_freq (int): Frequency of showing training logs. Default is 100.
        - exp_name (str): Experiment name, should not be None.

    Returns:
        - _logger (Callable): A logger function that takes an OnlineRLContext object as input.

    Raises:
        - ValueError: If exp_name is None.

    Example:
        task.use(online_logger(record_train_iter=False, train_show_freq=1000, exp_name=cfg.exp_name))
    """
    if task.router.is_active and not task.has_role(task.role.LEARNER):
        return task.void()
    if exp_name is None:
        raise ValueError("exp_name cannot be None")
    writer = DistributedWriter.get_instance(exp_name)
    last_train_show_iter = -1

    def _logger(ctx: "OnlineRLContext"):
        # ... (rest of the code)

    return _logger

In this case, it is clear for everyone that they need to pass a exp_name for logger, and this is necessary. Maybe chatGPT can help to write the describtion of every function and provide a example how to use it, since the doc does not cover all use of function yet.

Thanks for you feedback, we will add some hints when call online_logger middleware with NoneType problem. For DistributedWriter, we want to implement this module with singleton pattern, so it must be initialized at the beginning of the whole training program (e.g. ding_init function). We will add more comments and documents to indicate the necessary information here.

BTW, it is good practice to learn function usage through unittests, such this file for online_logger.

opendilab / DI-engine

how to use logger #715