[Exercise 5.14] Errors in updates, time indices and more

Jonathan2021 commented 3 years ago

Hello @vojtamolda, I have found several errors in your correction of exercise 5.14. In the following pictures I followed the architecture of your initial solution (skipping some parts so you can't really use it as a standalone solution but you can plug in the different bits into your solution) and hopefully provided an understandable correction. In red are either corrections of mistakes you made, remarks to make my solution more understandable, or errors I made but I couldn't be bothered rewritting everything etc. IMG_20210912_211426 IMG_20210912_211658 IMG_20210912_211718 I hope this is readable (and correct :sweat_smile: ). You can ignore the green star at the end in the algorithm (used to be a question I wanted to discuss with you but I got my answer while writting it here :laughing: ).

I really think this repo has the potential to become the correct centralized solution manual. (I didn't really go through your solutions for previous chapters and I can't bother doing so but you can expect me to raise issues for upcoming exercises if I think I spotted a mistake) Keep up the great work :+1: !

vojtamolda commented 3 years ago

Hello @Jonathan2021,

Again, thanks for opening the issue. I’m slow but I’ll eventually get to it. Could you, please, try to find and post a link to another independent solution of the exercise? Just so we know somebody else got the same result.

Jonathan2021 commented 3 years ago

I couldn't find any other solution. On the other hand I came across this. I don't know if it is still the case but you can try and send your answers to get the actual solutions.

vojtamolda / reinforcement-learning-an-introduction

[Exercise 5.14] Errors in updates, time indices and more #5