issues
search
seungjaeryanlee
/
agents
TF-Agents is a library for Reinforcement Learning in TensorFlow
Apache License 2.0
1
stars
0
forks
source link
Normalize intrinsic rewards
#8
Closed
seungjaeryanlee
closed
5 years ago
seungjaeryanlee
commented
5 years ago
Implemented
RND target and predictor networks are defined
RND loss is calculated and used as intrinsic reward
Intrinsic reward is normalized (Section 2.4)
Observation is normalized (Section 2.4)
RND predictor network is trained via average RND loss
To Be Implemented
RND should be usable with every agent type (currently only paired with DQN and PPO)
An environment wrapper to make it non-episodic (Section 2.3)
Q-Network with dual value head for intrinsic/extrinsic rewards (Section 2.3)
Separate discount factors for intrinsic and extrinsic rewards (Section 3.3)
CNN vs RNN policy (Section 3.5)
Implemented
To Be Implemented