Request: Exploration by Random Network Distillation (modified PPO)

8bitmp3 commented 5 years ago

A gentle request for a TF-Agents implementation of a modified PPO with an exploration bonus - for testing on Montezuma's Revenge.

Paper: Exploration by Random Network Distillation - Burda et al (OpenAI, University of Edinburgh).

"...exploration bonus with the extrinsic rewards we introduce a modification of Proximal Policy Optimization (PPO, Schulman et al. (2017)) that uses two value heads for the two reward streams. This allows the use of different discount rates for the different rewards, and combining episodic and non-episodic returns."

Code (TF 1.x): https://github.com/openai/random-network-distillation/tree/master/policies

oars commented 5 years ago

@seungjaeryanlee is working on this as part of a Google Summer of Code project. See his blog for progress: https://www.endtoend.ai/tags/gsoc/

robodhruv commented 5 years ago

Is there an update on this? Is it up for release soon / is there an early tested version ready for use?

tensorflow / agents

Request: Exploration by Random Network Distillation (modified PPO) #139