Error in solution to Exercise 2.2

vojtamolda / reinforcement-learning-an-introduction

Solutions to exercises in Reinforcement Learning: An Introduction (2nd Edition).

340 stars 74 forks source link

Error in solution to Exercise 2.2 #9

Closed earthwuyang closed 2 years ago

earthwuyang commented 2 years ago

I think the calculation of the rewards of each arm in each step is not sample-average according to the formula given in the book.

vojtamolda commented 2 years ago

Hello @earthwuyang

Thanks for opening the issue here. It's very possible that my solution is wrong!

Can you, please, explain in a bit more detail what is the correct solution and where did I make a mistake?

earthwuyang commented 2 years ago

微信图片_20220213151810 Thank you! I might also be wrong. I would be thankful if you could point out my error if it's my mistake.

vojtamolda commented 2 years ago

I recalculated my solution from scratch and you're right! I'm not sure how I arrived at my solution...

Anyway, to make the calculation clear, I think, one has to keep vector with sum of rewards for each action Si together with a counter ni. The counter tracks the number of times a particular state has been taken. The action-value is then calculated as a ratio of sum of rewards over the number of visits Qi = Si/ni.

Here's the full updated solution (it exactly matches yours):