Open weidaolee opened 4 months ago
Thanks for your attention about DI-engine. Now I will give some basic explanations about your questions:
- Why is the optimizer hard-coded in most of DI-engine's RL algorithms?
In most RL problems, the AdamW
optimizer with LambdaLRScheduler
is enough to acquire a group of excellent hyper-parameters. This is often no need to adjust this part frequently like other machine learning tasks. On the other hand, some RL algorithms like DDPG often need multiple optimizers rather than a single optimizer in algorithms like DQN, thus it is not easy to abstract a unified optimizer interface for all the policies.
- What is the best practice if I really want to use a different optimizer?
If you want to use different optimizer, you can indeed modify the relevant policy file to suit your specific requirements. The DI-engine framework is designed with flexibility in mind, allowing users to customize the model
, policy
, and env
modules according to their needs. Due the complexity and rapid development of RL area, it is difficult to abstract a fixed algorithm configurations and implementation. Thus, we implement this policy files like a kind of template for different users, and we hope these files could be easy-to-hack.
3 & 4
There are indeed some kinds of coding design patterns about how to elegantly compose different modules. However, in the early stage of DI-engine, we thought the optimizer is important to hack in the large-scale RL training such as DI-star. In this scenario, we modified the optimizer in-place to implement some special gradient clipping operations with little overhead of GPU memory. And this historical implementation is left to today's version. From the viewpoint of most classical RL algorithms, I think it is feasible to decoupling the RL algorithm and optimizer, but it needs some detailed design to figure out the problems I mentioned above. If interested, we can provide the corresponding support about the refactor plan.
Hello, thank you so much for your explanation. This is my imagination of this refactoring:
The TemplateOptimizer
has a template method called step
. The before_step
serves as a hook method that can be deferredly implemented by subclasses. We can use __getattr__
to delegate methods and attributes to self.optimizer
so that TemplateOptimizer
can act as a torch.optim.Optimizer
.
With this implementation, we can separate the clipping
and ignoring
method from the optimizers. Then, we can also separate the RL algorithms from the optimizers.
However, after carefully studying the implementation of DI-engine, I feel that refactoring this part does introduce a lot of complexity. If maintenance and modification costs outweigh the benefits, perhaps maintaining the existing implementation is a better choice.
I still provide a sketch of this idea. If you evaluate it and think it is not necessary for any reason, you can just close this issue. Alternatively, if you think this refactoring is valuable, I'd be willing to discuss it further.
Thanks again for your reply. By the way, your code is very good and it makes me feel happy physically and mentally while reading it.
class TemplateOptimizer(torch.optim.Optimizer):
def __init__(self, optimizer: torch.optim.Optimizer):
self.optimizer = optimizer
# Initialize the base Optimizer with the parameters of the passed optimizer
super(TemplateOptimizer, self).__init__(
optimizer.param_groups, optimizer.defaults
)
def before_step(self):
# This method can be overridden by subclasses if needed
...
def step(self, closure=None):
self.before_step()
self.optimizer.step(closure)
def __getattr__(self, name):
return getattr(self.optimizer, name)
class DingOptimizer(TemplateOptimizer):
def __init__(
self,
optimizer: torch.optim.Optimizer,
optim_type: str = "",
grad_clip_type: str = None,
clip_value: Union[float, None] = None,
clip_coef: float = 5,
clip_norm_type: float = 2.0,
clip_momentum_timestep: int = 100,
grad_norm_type: str = None,
grad_ignore_type: str = None,
ignore_value: Union[float, None] = None,
ignore_coef: float = 5,
ignore_norm_type: float = 2.0,
ignore_momentum_timestep: int = 100,
):
super(DingOptimizer, self).__init__(optimizer)
self.optim_type = optim_type
self.grad_clip_type = grad_clip_type
self.clip_value = clip_value
self.clip_coef = clip_coef
self.clip_norm_type = clip_norm_type
self.clip_momentum_timestep = clip_momentum_timestep
self.grad_norm_type = grad_norm_type
self.grad_ignore_type = grad_ignore_type
self.ignore_value = ignore_value
self.ignore_coef = ignore_coef
self.ignore_norm_type = ignore_norm_type
self.ignore_momentum_timestep = ignore_momentum_timestep
def clipping(self):
...
def ignoring(self):
...
@override
def before_step(self):
self.clipping()
self.ignoring()
I would like to incorporate optimizers from the parameterfree library into RL training. However, I've noticed that DI-engine has hard-coded the optimizer in most of its RL algorithms. e.g:
I have a few questions regarding this:
Why is the optimizer hard-coded in most of DI-engine's RL algorithms?
What is the best practice if I really want to use a different optimizer?
Why does DI-engine implement a different optimizer for each that from Pytorch instead of using a design pattern like the strategy pattern or template pattern, which would enable users to easily compose or implement their optimizers?
Based on my understanding, the optimizer and the "thing" to clip are independent. Therefore, is decoupling the RL algorithm and optimizer feasible?
Feel free to correct any misunderstandings or gaps in my knowledge. If you have plans to refactor this part, I am willing to contribute code.