vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
4.84k stars 560 forks source link

Use the training `end_e` as the `evaluation(..., epsilon=end_e)` for atari #430

Open pseudo-rnd-thoughts opened 7 months ago

pseudo-rnd-thoughts commented 7 months ago

Description

Bug fix for https://github.com/vwxyzjn/cleanrl/issues/429 I could repeat this for all DQN, C51 agents that have an end_e argument to prevent this issue in the future A potential alternative change is to add a new parameter for the evaluation epsilon

Types of changes

Checklist:

If you need to run benchmark experiments for a performance-impacting changes:

vercel[bot] commented 7 months ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
cleanrl ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 6, 2023 0:30am
pseudo-rnd-thoughts commented 7 months ago

@vwxyzjn Do you want to rerun all of the scripts because the final evaluation data is not used commonly or can this just be merged without?