tensorflow / agents

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
Apache License 2.0
2.81k stars 720 forks source link

Request: Exploration by Random Network Distillation (modified PPO) #139

Closed 8bitmp3 closed 4 years ago

8bitmp3 commented 5 years ago

A gentle request for a TF-Agents implementation of a modified PPO with an exploration bonus - for testing on Montezuma's Revenge.

Paper: Exploration by Random Network Distillation - Burda et al (OpenAI, University of Edinburgh).

"...exploration bonus with the extrinsic rewards we introduce a modification of Proximal Policy Optimization (PPO, Schulman et al. (2017)) that uses two value heads for the two reward streams. This allows the use of different discount rates for the different rewards, and combining episodic and non-episodic returns."

Code (TF 1.x): https://github.com/openai/random-network-distillation/tree/master/policies

oars commented 5 years ago

@seungjaeryanlee is working on this as part of a Google Summer of Code project. See his blog for progress: https://www.endtoend.ai/tags/gsoc/

robodhruv commented 5 years ago

Is there an update on this? Is it up for release soon / is there an early tested version ready for use?