Sarsa lambda implementation

Matyyas commented 4 years ago

Hi @hartikainen,

Thank you for the super cool repo 👍

I add one question regarding the Sarsa agent implementation. In the official pseudo-algorihtm of Sarsa lambda (slide 29) the Q value and the Eligibility Traces are updated at each step for every state-action pair of the environment.

If I correctly understood your code, it seams to me that you only update the current step state-action pair.

   `N[idx1] += 1
    E[idx1] += 1

    alpha = 1.0 / N[idx1]
    delta = reward + self.gamma * Q2 - Q1
    Q += alpha * delta * E
    E *= self.gamma * self.lmbd`

Did you make your implementation knowing such a difference?

Thanks a lot @hartikainen

hartikainen commented 4 years ago

Hey @Matyyas. Glad to hear you've found the repo useful. To be honest, it's been so long since I touched this repository that I can't recall exactly what my thinking there was. But it's likely that I was not aware of such difference, and had I been I probably would've not implemented it differently 🙂 Nice to hear you caught this though! Let me know what the difference is if you end up trying out both ways.

Matyyas commented 4 years ago

Aha 3 years is a bit of time 😅

Actually, you did implement the "official" version too, it was an error of my part 🙏

hartikainen / easy21

Sarsa lambda implementation #2