sfujim / BCQ

Author's PyTorch implementation of BCQ for continuous and discrete actions
MIT License
597 stars 141 forks source link

Performance of DDPG and BCQ #11

Closed SZH1230456 closed 1 year ago

SZH1230456 commented 3 years ago

I am trying to reproduce the results of continuous environment, but the results are poor. Could you please give more details about the results? For example, what is the result when we run "python main.py --train_behavioral --gaussian_std 0.1"?

sfujim commented 2 years ago

DDPG's performance is fairly inconsistent. I believe the hyperparameters/seeds in the GitHub worked well at one point but it's possible it's worse now after version changes to MuJoCo or PyTorch. For the paper, when collecting experts we trained multiple policies (I think 15?) and took the top 5-10. Working with HalfCheetah instead of Hopper would probably help, or using a more modern RL algorithm like TD3.