[x] I have checked that there is no similar issue in the repo (required)
Possible Solution
StabeBaselines' ReplayBuffer currently does not support the new format returned by gym.Env.step. Their step api changed from:
obs, rew, done, info = env.step(action) to obs, rew, terminated, truncated, info = env.step(action).
We would need to implement a slightly modified version of the ReplayBuffer in cleanRL itself. Other than this, the changes required are minimal.
I can submit an initial PR with changes required for SAC if you're interested.
Update on the ticket - the current gym master is set to release 0.26.0 which enables obs, rew, terminated, truncated, info = env.step(action) by default.
Problem Description
Upgrade gym version used in cleanrl from 0.23.1 to 0.25.1
Checklist
poetry install
(see CleanRL's installation guideline.Possible Solution
StabeBaselines' ReplayBuffer currently does not support the new format returned by
gym.Env.step
. Their step api changed from:obs, rew, done, info = env.step(action)
toobs, rew, terminated, truncated, info = env.step(action)
. We would need to implement a slightly modified version of the ReplayBuffer in cleanRL itself. Other than this, the changes required are minimal.I can submit an initial PR with changes required for SAC if you're interested.