Normalize intrinsic rewards - Githubissues

seungjaeryanlee / agents

TF-Agents is a library for Reinforcement Learning in TensorFlow

Apache License 2.0

1 stars 0 forks source link

Normalize intrinsic rewards #8

Closed seungjaeryanlee closed 5 years ago

seungjaeryanlee commented 5 years ago

Implemented

RND target and predictor networks are defined
RND loss is calculated and used as intrinsic reward
Intrinsic reward is normalized (Section 2.4)
Observation is normalized (Section 2.4)
RND predictor network is trained via average RND loss

To Be Implemented

RND should be usable with every agent type (currently only paired with DQN and PPO)
An environment wrapper to make it non-episodic (Section 2.3)
Q-Network with dual value head for intrinsic/extrinsic rewards (Section 2.3)
Separate discount factors for intrinsic and extrinsic rewards (Section 3.3)
CNN vs RNN policy (Section 3.5)