Peer Review - Thiemen Mussche

I love the snow!

For the T-update: new Q(S, A) = Q(S,A) + R(S,A) + Max Q'(S',A') - Q(S,A). There is a term Q(S, A) and -Q(S, A), don't they cancel eachother out? Why does this work?

It isn't clear to me how a Q-table should look like and why it works, maybe you could add an example? Same for the alpha parameter in the TD-update rule, not sure what it is supposed to do.

Why do we assign '999' to go from L6 to L6 and not when going from any neighbouring state to L6?

For the implementation, you could enter the bits of code you explain as strings so it doesn't throw errors.

"Since we do not know the exact number of iterations the robot will take in order to find out the optimal route, we will simply loop the next set of processes until the next location is not equal to the ending location." In the code you use a while loop that runs until the next location is the end location, contradicting your previous statement.

Good luck! Thiemen

milanghe / STMOZOO

Peer Review - Thiemen Mussche #4