ikostrikov / pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
MIT License
3.57k stars 829 forks source link

With which version of code is the benchmark curve generated? #175

Closed ktlichkid closed 5 years ago

ktlichkid commented 5 years ago

Hi dear Ilya,

I have been investigating your implementation of A2C and want to reproduce some of the benchmark curves. Since the master branch is iterating very fast, could you please tell me which version (revision number of a stable version maybe?) did you use to generate the curves you attached in the Readme file? It would be great help to me.

Also, you mentioned that

I tried to reproduce OpenAI results as closely as possible.

in your Readme file too. OpenAI baseline's head version is also changing very fast, which revision of OpenAI baseline did you use as your benchmark?

Thank you so much for answering!! Really appreciate your help.

ikostrikov commented 5 years ago

I would try the versions of this repo and baselines that were the latest one when the images where uploaded.

ktlichkid commented 5 years ago

Thanks for you quick response. I'll try them