opendilab / DI-engine

OpenDILab Decision AI Engine. The Most Comprehensive Reinforcement Learning Framework B.P.
https://di-engine-docs.readthedocs.io
Apache License 2.0
3k stars 367 forks source link

how to use logger #715

Closed zhixiongzh closed 1 year ago

zhixiongzh commented 1 year ago

Then I got such error

  File "/opt/conda/lib/python3.10/site-packages/ding/framework/middleware/functional/logger.py", line 68, in _logger
    writer.add_scalar('basic/eval_episode_return_mean', ctx.eval_value, ctx.env_step)
AttributeError: 'NoneType' object has no attribute 'add_scalar'

It seems that I am using online_logger in the wrong way. Any example on how to use the online_logger from ding.framework.middleware? BTW, online_logger correspond to online RL algorithm?

zhixiongzh commented 1 year ago

I found the reason. Before using online_logger, I have to call ding_init(cfg), but this function only has one function for now, which is DistributedWriter.get_instance(cfg.exp_name), this is a little bit too complex, because when I want to use logger, I have to know that I have to call DistributedWriter.get_instance(cfg.exp_name) beforehand, which is not written anywhere in the document. Why not just adding another argument exp_name in the function of online_logger such that

def online_logger(record_train_iter: bool = False, train_show_freq: int = 100, exp_name: str = None) -> Callable:
    """
    Create an online logger for recording training and evaluation metrics.

    Arguments:
        - record_train_iter (bool): Whether to record training iteration. Default is False.
        - train_show_freq (int): Frequency of showing training logs. Default is 100.
        - exp_name (str): Experiment name, should not be None.

    Returns:
        - _logger (Callable): A logger function that takes an OnlineRLContext object as input.

    Raises:
        - ValueError: If exp_name is None.

    Example:
        task.use(online_logger(record_train_iter=False, train_show_freq=1000, exp_name=cfg.exp_name))
    """
    if task.router.is_active and not task.has_role(task.role.LEARNER):
        return task.void()
    if exp_name is None:
        raise ValueError("exp_name cannot be None")
    writer = DistributedWriter.get_instance(exp_name)
    last_train_show_iter = -1

    def _logger(ctx: "OnlineRLContext"):
        # ... (rest of the code)

    return _logger

In this case, it is clear for everyone that they need to pass a exp_name for logger, and this is necessary. Maybe chatGPT can help to write the describtion of every function and provide a example how to use it, since the doc does not cover all use of function yet.

PaParaZz1 commented 1 year ago

Thanks for you feedback, we will add some hints when call online_logger middleware with NoneType problem. For DistributedWriter, we want to implement this module with singleton pattern, so it must be initialized at the beginning of the whole training program (e.g. ding_init function). We will add more comments and documents to indicate the necessary information here.

BTW, it is good practice to learn function usage through unittests, such this file for online_logger.