philtabor / Youtube-Code-Repository

Repository for most of the code from my YouTube channel
859 stars 479 forks source link

Model never learns the game #21

Closed havietisov closed 3 years ago

havietisov commented 3 years ago

Hi. I was following your youtube tutorial on Actor-critic method in continious space (lunar lander). However, despite having same code, my model almost never score higher than zero, nevermind reaching anywhere near 200, even after significant amount of episodes. Code is following : https://github.com/6opoDuJIo/RL_Playground/blob/master/lunar_lander.py And part of the log file is :


episode 6265 score -92.32 average score -178.79
episode 6266 score -17.88 average score -177.76
episode 6267 score -119.38 average score -176.25
episode 6268 score -104.23 average score -173.38
episode 6269 score -83.28 average score -172.56
episode 6270 score -146.12 average score -172.68
episode 6271 score -126.20 average score -173.07
episode 6272 score -226.61 average score -172.13
episode 6273 score -245.62 average score -173.37
episode 6274 score -105.59 average score -171.55
episode 6275 score -141.94 average score -173.44
episode 6276 score -301.27 average score -175.40
episode 6277 score -82.96 average score -175.56
episode 6278 score -134.57 average score -175.98
episode 6279 score -51.25 average score -174.88
episode 6280 score -81.76 average score -174.06
episode 6281 score -227.78 average score -173.70
episode 6282 score -386.15 average score -175.98
episode 6283 score -297.21 average score -177.16
episode 6284 score -422.21 average score -180.37
episode 6285 score -140.92 average score -180.87
episode 6286 score -236.97 average score -180.38
episode 6287 score -119.24 average score -179.11
episode 6288 score -76.24 average score -179.02
episode 6289 score -85.39 average score -176.80
episode 6290 score -131.07 average score -178.32
episode 6291 score -110.64 average score -179.56
episode 6292 score -150.60 average score -179.94
episode 6293 score -68.53 average score -179.51
episode 6294 score -184.71 average score -179.02
episode 6295 score -263.88 average score -180.30
episode 6296 score -287.41 average score -182.61
episode 6297 score -98.54 average score -181.06
episode 6298 score -82.03 average score -180.98
episode 6299 score -284.01 average score -181.44
episode 6300 score -88.97 average score -180.43
episode 6301 score -102.73 average score -178.80
episode 6302 score -179.52 average score -180.30
episode 6303 score -222.17 average score -181.51
episode 6304 score -246.87 average score -182.72
episode 6305 score -331.83 average score -184.92
episode 6306 score -361.46 average score -187.54
episode 6307 score -89.69 average score -187.46
episode 6308 score -27.86 average score -187.13
episode 6309 score -135.48 average score -184.37
episode 6310 score -115.25 average score -182.98
philtabor commented 3 years ago

You aren't calling the learn function in the main loop.

I'll be doing a video on actor critic in tensorflow 2 this upcoming week, you should watch it and check it out. It's a much cleaner implementation.

philtabor commented 3 years ago

Didn't mean to close it before giving the OP time to comment.

havietisov commented 3 years ago

Damn, that was embarassing and stupid of me, i'm sorry. However, it still failing to improve even after 180+ episodes. Here is the log (source file has been updated) :


episode 167 score -245.29 average score -263.41
episode 168 score -579.41 average score -269.46
episode 169 score -349.60 average score -269.98
episode 170 score -194.49 average score -268.06
episode 171 score -276.85 average score -268.63
episode 172 score -272.52 average score -268.43
episode 173 score -238.40 average score -268.99
episode 174 score -306.81 average score -268.89
episode 175 score -152.71 average score -268.69
episode 176 score -410.33 average score -270.20
episode 177 score -314.56 average score -272.95
episode 178 score -281.38 average score -273.94
episode 179 score -236.19 average score -274.66
episode 180 score -381.23 average score -275.26
episode 181 score -292.67 average score -276.26
episode 182 score -124.96 average score -274.34
episode 183 score -222.39 average score -274.91
episode 184 score -221.86 average score -274.15
episode 185 score -390.36 average score -276.50
episode 186 score -468.62 average score -279.71
episode 187 score -336.55 average score -279.89
episode 188 score -290.84 average score -279.67`
havietisov commented 3 years ago

Same happened when i tried some of your code (https://github.com/philtabor/Actor-Critic-Methods-Paper-To-Code/tree/master/ActorCritic).

I literally copied it and had to change nothing. Log is following :


episode  44 score -523.9 average score -231.1
episode  45 score -106.3 average score -228.4
episode  46 score -233.1 average score -228.5
episode  47 score -124.3 average score -226.3
episode  48 score -86.0 average score -223.5
episode  49 score -221.1 average score -223.4
episode  50 score -421.5 average score -227.3
episode  51 score -522.7 average score -233.0
episode  52 score -351.9 average score -235.2
episode  53 score -176.0 average score -234.1
episode  54 score -414.9 average score -237.4
episode  55 score -125.2 average score -235.4
episode  56 score -273.4 average score -236.1
episode  57 score -96.5 average score -233.7
episode  58 score -540.9 average score -238.9
episode  59 score -122.6 average score -236.9
episode  60 score -186.6 average score -236.1
episode  61 score -263.3 average score -236.6
episode  62 score -272.6 average score -237.1
episode  63 score -23.6 average score -233.8
episode  64 score -163.9 average score -232.7
episode  65 score -419.4 average score -235.5
episode  66 score -247.5 average score -235.7
episode  67 score -220.6 average score -235.5
episode  68 score -518.1 average score -239.6
episode  69 score -161.8 average score -238.5
episode  70 score -71.6 average score -236.1
episode  71 score -8.5 average score -233.0
episode  72 score -198.3 average score -232.5
episode  73 score -337.2 average score -233.9
episode  74 score -245.1 average score -234.1
episode  75 score -615.1 average score -239.1
episode  76 score -401.5 average score -241.2
episode  77 score -289.3 average score -241.8
episode  78 score -321.4 average score -242.8
episode  79 score -207.5 average score -242.4
episode  80 score -50.3 average score -240.0
episode  81 score -483.8 average score -243.0
episode  82 score -248.1 average score -243.0
episode  83 score -178.2 average score -242.3
episode  84 score -194.9 average score -241.7
episode  85 score -253.1 average score -241.8
episode  86 score -235.4 average score -241.8
episode  87 score -373.9 average score -243.3
episode  88 score -280.5 average score -243.7
episode  89 score -312.7 average score -244.4
episode  90 score -127.4 average score -243.2
episode  91 score -186.4 average score -242.5
episode  92 score -179.2 average score -241.9
episode  93 score -234.0 average score -241.8
episode  94 score -170.7 average score -241.0
episode  95 score -455.6 average score -243.3
episode  96 score -68.8 average score -241.5
episode  97 score -243.6 average score -241.5
episode  98 score -173.7 average score -240.8
episode  99 score -328.9 average score -241.7
episode  100 score -222.0 average score -242.5
episode  101 score -390.5 average score -242.1
episode  102 score -267.5 average score -243.2
episode  103 score -54.5 average score -243.1
episode  104 score -176.3 average score -243.9
episode  105 score -170.9 average score -242.1
episode  106 score -169.3 average score -242.9
episode  107 score -26.2 average score -239.8
episode  108 score -23.2 average score -237.2
episode  109 score -299.9 average score -237.0
episode  110 score -76.2 average score -235.9
episode  111 score -425.6 average score -239.4
episode  112 score -111.1 average score -238.6
episode  113 score -232.3 average score -237.5
episode  114 score -80.1 average score -237.1
episode  115 score -495.7 average score -240.9
episode  116 score -118.2 average score -238.5
episode  117 score -231.8 average score -238.4
episode  118 score 5.2 average score -237.2
episode  119 score -375.7 average score -238.3
episode  120 score -151.8 average score -239.0
episode  121 score 28.0 average score -235.9
episode  122 score -264.3 average score -237.5
episode  123 score -250.5 average score -236.8
episode  124 score -21.3 average score -234.7
episode  125 score -279.2 average score -235.2
episode  126 score -447.0 average score -235.6
episode  127 score -326.3 average score -234.4
episode  128 score -235.5 average score -236.0
episode  129 score -305.7 average score -237.0
episode  130 score -172.0 average score -237.7
episode  131 score -21.8 average score -237.5`
philtabor commented 3 years ago

Do you find that result surprising? If so, why?

havietisov commented 3 years ago

Well, speaking of the last link i've referenced, i don't know what to expect, i just assumed that it's supposed to work. Last piece of log doesn't show significant improvement over 150 or so episodes in this environment, and even if i keep it for longer, i'm starting to see quite opposite effect, with score sometimes going below thousand. However. Whenever someone's following the tutorial, i think it's fair for that person to expect that tutorial will lead you to something functional, unless it's a tutorial on "here's why this stuff doesn't work".

philtabor commented 3 years ago

In the video tutorial for to this code, right at the very beginning of the video... the first 10 seconds or so, I show a learning plot.

What does this plot show?

https://www.youtube.com/watch?v=2vJtbAha3To

havietisov commented 3 years ago

Well, i removed any parts that used your utility class after copying the code. I suspect you want me to actually put it back, and to see the graph.

philtabor commented 3 years ago

No, I want you to actually watch the video and tell me what the learning plot tells you about how long it takes the agent to learn.

havietisov commented 3 years ago

Yep, you're right, my bad, i didn't paid attention to the actual numbers on Game axis. Now i need to fix my CUDA installation because even on a threadripper each iteration is too slow. Thank you for your feedback and for your patience.