x35f / unstable_baselines

Re-implementations of SOTA RL algorithms.
127 stars 12 forks source link

Are all the tasks v3? #46

Closed red-tie closed 1 year ago

red-tie commented 1 year ago

I see that the current performance curve is based on v3 tasks. However, the config file in the model based RL (mbpo) contains many v2 tasks. Are the v2 tasks in the mbpo same as v3 tasks?

typoverflow commented 1 year ago

cc @x35f

x35f commented 1 year ago

Hi, the current performance curve of MBPO in Hopper, HalfCheetah and Walker2d are based on the v3 tasks. The original MBPO implementation made some changes to the default Ant and Humanoid v2 environment (please refer to https://github.com/jannerm/mbpo and the unstable_baselines/envs/mbpo directory for details), and we retained them. The performance curve of MBPO in Ant and Humanoid are actually evaluated on these modified versions of tasks.