What is the default setting (e.g., total training steps, learning rate) of DQN?

tensorlayer / RLzoo

A Comprehensive Reinforcement Learning Zoo for Simple Usage 🚀

http://rlzoo.readthedocs.io

Apache License 2.0

628 stars 96 forks source link

What is the default setting (e.g., total training steps, learning rate) of DQN? #14

Closed xinghua-qu closed 4 years ago

xinghua-qu commented 4 years ago

Hi,

In your code, the training parameter setting is imported from utils. from rlzoo.common.utils import call_default_params

May I check is there any document that explains what is this default setting and how to you fix it?

quantumiracle commented 4 years ago

Hi, The call_default_params returns the hyper-parameters stored in two dictionaries alg_params and learn_params, which can be printed to see what are contained inside. Hyper-parameters in these two dictionaries can also be changed by users before instantiating the agent and starting the learning process.

If you want to know exactly where the default hyper-parameters come from, they are stored in an individual Python script as default.py in each algorithm file in ./rlzoo/algorithms/.

quantumiracle commented 4 years ago

We will release new version of RLzoo with much more explicit hyper-parameters configuration process soon!

xinghua-qu commented 4 years ago

Many thanks for the clarification. Now it's more clear for me. It's really a nicer baseline comparing with stablebaselines and openai baseline.

BTW, if you can provide some benchmark policies (just like what have been done in stablebaselines zoo) that are well tuned, that will be so great. In that way, the toolbox can be treated as a standard initialization for some research directions (e.g., offline RL and adversarial robustness).

If you already have some policies well trained on Freeway, BankHeist, Boxing et al., could you please share it?

quantumiracle commented 4 years ago

Thanks for your suggestions. We will consider providing the benchmark policies later on (soon), but right now we do not have these results yet. A thorough benchmark will have some requirements on the computation machines and human labours, and I wish you could understand that at present.