Open zhan0903 opened 5 years ago
I can confirm this, actually ddpg/td3/sac all show similar unreasonable AverageTestEpRet.
You can see plots of all algorithms here, it seems like all off-policy algorithms have the same behavior, while all on-policy algorithms seem to be fine.
Hi, I use the spinningup td3 algorithm to test pybullet's HumanoidBulletEnv-v0 environment, but got the test score around 1600 even from the beginning which is not normal(td3 should not work in this benchmark), Does anyone have similar results? Thank you.