vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.02k stars 575 forks source link

bug: incorrect logic in GAE calculation #337

Closed vwxyzjn closed 1 year ago

vwxyzjn commented 1 year ago

Description

334 actually introduced a major bug... It's totally my bad 🙈

        storage = storage.replace(
            advantages=advantages,
            returns=storage.advantages + storage.values,
        )

should have been the following because storage.advantages has zero values.

        storage = storage.replace(
            advantages=advantages,
            returns=advantages + storage.values,
        )

Types of changes

vercel[bot] commented 1 year ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview Dec 31, 2022 at 10:58PM (UTC)