svpino / lunar-lander

OpenAI Gym's LunarLander-v2 Implementation
34 stars 2 forks source link

Migrate code to TensorFlow 2.16 #2

Open svpino opened 4 months ago

eabase commented 4 months ago

I tried to do this, the code runs, but got blocked by the complexity. Something is not right and then it doesn't seem to learn, or extremely slow. (Or maybe because I'm running on CPU only?)

Issues:

You can find my code in this Gist. It is meant to be run on Windows and from Powershell. Please feel free to include and modify.

In powershell, run it with:

$env:TF_CPP_MIN_LOG_LEVEL=1; $env:TF_ENABLE_ONEDNN_OPTS='0'; python.exe -qu -X utf8=1 .\lunar-lander.py
eabase commented 4 months ago

@svpino

Did you have a chance to look at this?

(Yeah, I know it's all exciting with GPT-4o, but we already determined it's not as exciting as we were sold.)

svpino commented 4 months ago

Ha ha! Yeah, I haven't had any time for this. The last time I touched this codebase was when I wrote it (and when I asked Devin to fix it.) I'm planning to record a video about this at some point, so I'll have to fix it, I guess.

eabase commented 4 months ago

I can only imagine. I looking into that as well, but I wanted this toy training model to work, and couldn't so it was a bit depressing when it didn't. The issue seem to be that I am not able to run it in training mode. (See my gist.) It just crashes due to some dimension error. Then, since it is not running on GPU, it seem very slow, so I can't really tell if it is learning. Another funny thing is that it seem to always start each loop by just firing the same engine, not other 2. Not sure how that behavior is randomized. (I have seen it fire 2 at some point, but it seem to come in randomly.)

PS. Regarding CUDA, I don't roll back, only forward!
I will see if I can get an updated CUDA compiled TF version going...

eabase commented 4 months ago

@svpino If you can have a look at my gist (linked above) I can create a PR.

AFAICT, it is not quite working as intended, and I don't know TF to fix it. Also, no idea why TRAINING mode doesn't work and complains about tensor dimension issue.