opendilab / DI-engine

OpenDILab Decision AI Engine
https://di-engine-docs.readthedocs.io
Apache License 2.0
2.78k stars 348 forks source link

How to introduce other optimizers into DI-engine? #813

Open weidaolee opened 3 days ago

weidaolee commented 3 days ago

I would like to incorporate optimizers from the parameterfree library into RL training. However, I've noticed that DI-engine has hard-coded the optimizer in most of its RL algorithms. e.g:

I have a few questions regarding this:

  1. Why is the optimizer hard-coded in most of DI-engine's RL algorithms?

  2. What is the best practice if I really want to use a different optimizer?

  3. Why does DI-engine implement a different optimizer for each that from Pytorch instead of using a design pattern like the strategy pattern or template pattern, which would enable users to easily compose or implement their optimizers?

  4. Based on my understanding, the optimizer and the "thing" to clip are independent. Therefore, is decoupling the RL algorithm and optimizer feasible?

Feel free to correct any misunderstandings or gaps in my knowledge. If you have plans to refactor this part, I am willing to contribute code.

PaParaZz1 commented 2 days ago

Thanks for your attention about DI-engine. Now I will give some basic explanations about your questions:

  1. Why is the optimizer hard-coded in most of DI-engine's RL algorithms?

In most RL problems, the AdamW optimizer with LambdaLRScheduler is enough to acquire a group of excellent hyper-parameters. This is often no need to adjust this part frequently like other machine learning tasks. On the other hand, some RL algorithms like DDPG often need multiple optimizers rather than a single optimizer in algorithms like DQN, thus it is not easy to abstract a unified optimizer interface for all the policies.

  1. What is the best practice if I really want to use a different optimizer?

If you want to use different optimizer, you can indeed modify the relevant policy file to suit your specific requirements. The DI-engine framework is designed with flexibility in mind, allowing users to customize the model, policy, and env modules according to their needs. Due the complexity and rapid development of RL area, it is difficult to abstract a fixed algorithm configurations and implementation. Thus, we implement this policy files like a kind of template for different users, and we hope these files could be easy-to-hack.

3 & 4

There are indeed some kinds of coding design patterns about how to elegantly compose different modules. However, in the early stage of DI-engine, we thought the optimizer is important to hack in the large-scale RL training such as DI-star. In this scenario, we modified the optimizer in-place to implement some special gradient clipping operations with little overhead of GPU memory. And this historical implementation is left to today's version. From the viewpoint of most classical RL algorithms, I think it is feasible to decoupling the RL algorithm and optimizer, but it needs some detailed design to figure out the problems I mentioned above. If interested, we can provide the corresponding support about the refactor plan.