Closed SZH1230456 closed 1 year ago
DDPG's performance is fairly inconsistent. I believe the hyperparameters/seeds in the GitHub worked well at one point but it's possible it's worse now after version changes to MuJoCo or PyTorch. For the paper, when collecting experts we trained multiple policies (I think 15?) and took the top 5-10. Working with HalfCheetah instead of Hopper would probably help, or using a more modern RL algorithm like TD3.
I am trying to reproduce the results of continuous environment, but the results are poor. Could you please give more details about the results? For example, what is the result when we run "python main.py --train_behavioral --gaussian_std 0.1"?