Open Matyyas opened 4 years ago
Hey @Matyyas. Glad to hear you've found the repo useful. To be honest, it's been so long since I touched this repository that I can't recall exactly what my thinking there was. But it's likely that I was not aware of such difference, and had I been I probably would've not implemented it differently 🙂 Nice to hear you caught this though! Let me know what the difference is if you end up trying out both ways.
Aha 3 years is a bit of time 😅
Actually, you did implement the "official" version too, it was an error of my part 🙏
Hi @hartikainen,
Thank you for the super cool repo 👍
I add one question regarding the Sarsa agent implementation. In the official pseudo-algorihtm of Sarsa lambda (slide 29) the Q value and the Eligibility Traces are updated at each step for every state-action pair of the environment.
If I correctly understood your code, it seams to me that you only update the current step state-action pair.
Did you make your implementation knowing such a difference?
Thanks a lot @hartikainen