Closed zhixiongzh closed 1 year ago
I found the reason.
Before using online_logger
, I have to call ding_init(cfg)
, but this function only has one function for now, which is DistributedWriter.get_instance(cfg.exp_name)
, this is a little bit too complex, because when I want to use logger, I have to know that I have to call DistributedWriter.get_instance(cfg.exp_name)
beforehand, which is not written anywhere in the document. Why not just adding another argument exp_name
in the function of online_logger
such that
def online_logger(record_train_iter: bool = False, train_show_freq: int = 100, exp_name: str = None) -> Callable:
"""
Create an online logger for recording training and evaluation metrics.
Arguments:
- record_train_iter (bool): Whether to record training iteration. Default is False.
- train_show_freq (int): Frequency of showing training logs. Default is 100.
- exp_name (str): Experiment name, should not be None.
Returns:
- _logger (Callable): A logger function that takes an OnlineRLContext object as input.
Raises:
- ValueError: If exp_name is None.
Example:
task.use(online_logger(record_train_iter=False, train_show_freq=1000, exp_name=cfg.exp_name))
"""
if task.router.is_active and not task.has_role(task.role.LEARNER):
return task.void()
if exp_name is None:
raise ValueError("exp_name cannot be None")
writer = DistributedWriter.get_instance(exp_name)
last_train_show_iter = -1
def _logger(ctx: "OnlineRLContext"):
# ... (rest of the code)
return _logger
In this case, it is clear for everyone that they need to pass a exp_name for logger, and this is necessary. Maybe chatGPT can help to write the describtion of every function and provide a example how to use it, since the doc does not cover all use of function yet.
Thanks for you feedback, we will add some hints when call online_logger
middleware with NoneType
problem.
For DistributedWriter
, we want to implement this module with singleton pattern, so it must be initialized at the beginning of the whole training program (e.g. ding_init
function). We will add more comments and documents to indicate the necessary information here.
BTW, it is good practice to learn function usage through unittests, such this file for online_logger
.
I am running the code with
ding.bonus.ppof
, I do not want to use wandb_online_logger but just a tensorboard, so I replace thewandb_online_logger
withonline_logger(train_show_freq=1000)
like followingThen I got such error
It seems that I am using online_logger in the wrong way. Any example on how to use the
online_logger
fromding.framework.middleware
? BTW,online_logger
correspond to online RL algorithm?