Reproduce ApeX paper results for continuous action - Githubissues

keiohta / tf2rl

TensorFlow2 Reinforcement Learning

MIT License

464 stars 104 forks source link

Reproduce ApeX paper results for continuous action #22

Open keiohta opened 5 years ago

keiohta commented 5 years ago

Distributed Prioritized Experience Replay Hyper parameters are shown in appendix D

keiohta commented 5 years ago

Critic: [400, 300]
Actor: [300, 200]
Gradient clipping [-1, 1], element-wise
Adam with learning rate of 0.0001
The target network is copied from the online network every 100 training batches
Priority experience replay
- alpha_sample = 0.6
- alpha_evict = -0.4
Noise is added to DDPG with sigma = 0.3

keiohta commented 5 years ago

Waiting for cpprb implementation of NStepReplayBuffer.