Closed justheuristic closed 8 years ago
Implement an algorithm that learns common baseline for Q-values
http://arxiv.org/pdf/1301.2315.pdf
Yes and it might be nice to include it to the second tutorial on "some advanced tricks"
Similar approach was implemented via Advantage Actor-Critic.
Implement an algorithm that learns common baseline for Q-values
http://arxiv.org/pdf/1301.2315.pdf