I think there is a bug in the Monte Carlo Methods jupyter notebook, the function "mc_prediction_q". "Algorithm 9" on the provided cheatsheet indicates that the final assignment to Q(s,a) happens outside of the episode loop. But the notebook has this assignment occurring in the innermost loop
I think there is a bug in the Monte Carlo Methods jupyter notebook, the function "mc_prediction_q". "Algorithm 9" on the provided cheatsheet indicates that the final assignment to Q(s,a) happens outside of the episode loop. But the notebook has this assignment occurring in the innermost loop