Closed schrum2 closed 5 years ago
Attempted to run the Floydhub fix but couldn't get it to work; there were some issues with the way some of the code was set up in the Jupyter Notebook.
For instance, one line of code was retro.list_games()
. list_games
is not a method in retro's __init__.py
file but is rather a method in a subdirectory of retro called data. Another line involved calling a method titled retro.get_game_path()
, which appears neither in retro's __init__.py
file nor in other such files in the retro package.
By following the instructions here: https://github.com/MaxKelsen/openai-sonic I actually was able to get a random agent to play Sonic, but not Sonic 1. I got the agent to play Sonic and Knuckles. Still, that's pretty cool progress. The next step is to try an actual learning algorithm.
Important reference points: https://arxiv.org/abs/1804.03720 https://openai.com/blog/first-retro-contest-retrospective/
We both have the Sonic code running now, so this issue is well under way. Here is what needs to be done in order to close out this issue.
1) Copy code from one of the top 3 Sonic competitors and/or the Gotta Learn Fast benchmarks into the repo. 2) Run one of those algorithms long enough to noticeably improve behavior. 3) Log information about what method you used, how long it took to run, and what performance you achieved in this GitHub thread. 4) If the performance isn't as good as it should be (according to the paper or the competition results, try to explain why (not trained long enough?)
Then close this issue.
Something else important to note about this issue. The default Sonic reward seems to be the game score, but that's almost worthless. However, when you go to this site, https://contest.openai.com/2018-1/details/ , and use their retro-contest repo to run Sonic instead, the reward is based on how much progress you make moving to the right of the level. I took their simple-agent, removed the docker communication, and committed it to the repo.
I feel that it is also worthwhile to include this link in the issue thread: https://openai.com/blog/first-retro-contest-retrospective/
3rd place PPO winner with meta-learner (trains based on given Sonic levels) and submitted fast-learner with docker https://github.com/aborghi/retro_contest_agent
We're able to run that agent, but who knows how long it will take ... will run overnight. However, we need to investigate ways of saving and loading the model so we can train across multiple sessions, and also train with rendering off for better speed (only makes sense if we can load a saved model later to actually watch it)
Tried to track the rewards/returns the agent was receiving, but this was confusing. The variables mb_rewards and rewards are lists/vectors with multiple values. We can't seem to get the step by step reward or even the one-episode return
Alex and I were both able to run the third place PPO agent over night and it eventually beat the first act of Green Hill Zone. Annoyingly, the logging doesn't seem to track useful information, such as how the rewards/return change over time, but at least the visualization shows us that we are beating that one level.
Need to try more levels.
Looking closer at the 3rd-place-winning code, I realized that it is almost one-for-one with the PPO2 Baselines implementation, with numerous methods and variables from the ppo2.py, model.py, and runner.py files in the ~\baselines\baselines\ppo2
sub-directory being virtually completely pasted into the ppo2ttifrutti.py file. This phenomenon would imply, of course, that FastLearner would be saving models and checkpoints in a manner that is similar to how the PPO2 Baselines implementation is. However, running PPO2 does not guarantee anything being saved, potentially a consequence of the relevant variable being set to None.
Manipulating the save_interval
variable to some value n, initially set at 0, enabled me to load "checkpoints" every n updates. If we presume that there have been 100 updates, three files are created on the hundredth update:
00100_tf.data-00000-of-00001
00100_tf.index
00100_tf.meta
The frequency with which these files are created is determined by n. How would these "checkpoints" be relevant, though? One idea Dr. Schrum and I had was to look at the checkpoints.py file, which is periodically edited every time a new "checkpoint" occurs. At the hundredth update, the file's contents would look like:
model_checkpoint_path: "~\\checkpoints\\00100_tf"
all_model_checkpoint_paths: "~\\checkpoints\\00100_tf"
If a model_checkpoint_path
points to a network with updated weights, would it make sense to load this path as a model?
Basically, PPO seems to work, but agents still have problems dealing with loops and backtracking. They do sometimes eventually get through these obstacles, but there is definitely room for improvement.
I think that takes care of the specific details relevant to this issue. I created a new issue for addressing the saving and loading of PPO models #17 , so all future work on that issue should be documented there.
There is a lot of info on this out there. I'm a bit surprised that NEAT has even been applied to Sonic. Here are several helpful links with information:
https://www.freecodecamp.org/news/how-to-use-ai-to-play-sonic-the-hedgehog-its-neat-9d862a2aef98/ https://github.com/Vedant-Gupta523/sonicNEAT
https://github.com/MaxKelsen/openai-sonic https://contest.openai.com/2018-1/details/
It looks as though you have to buy the Sonic games on Steam in order to get the ROM files, but I'll see if I can get them for you. However, in the meantime, you may look at this option:
https://github.com/floydhub/gym-retro-template
You might be able to use this to run code directly on FloydHub, a cloud platform.