Open raharth opened 3 years ago
There is a first implementation which fails to converge though. Could be due to a bug but also due to bad hyper params
There is a paper on it Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach
by Xu et al.
Their results are somewhat strange though. Their baseline of DQN/DDQN is worse than mine even though I didn't little to no tuning.
Even though they implement the exactly theoretical idea, I fail to make it converge while theirs converges (suboptimal to my baselines)
Implement the general SARSA algorithm according to the definition of Barto and Sutton