nazaruka / gym-http-api

NSGA2-based Sonic agent + experimental code
MIT License
1 stars 1 forks source link

Sonic the Hedgehog #15

Closed schrum2 closed 5 years ago

schrum2 commented 5 years ago

There is a lot of info on this out there. I'm a bit surprised that NEAT has even been applied to Sonic. Here are several helpful links with information:

https://www.freecodecamp.org/news/how-to-use-ai-to-play-sonic-the-hedgehog-its-neat-9d862a2aef98/ https://github.com/Vedant-Gupta523/sonicNEAT

https://github.com/MaxKelsen/openai-sonic https://contest.openai.com/2018-1/details/

It looks as though you have to buy the Sonic games on Steam in order to get the ROM files, but I'll see if I can get them for you. However, in the meantime, you may look at this option:

https://github.com/floydhub/gym-retro-template

You might be able to use this to run code directly on FloydHub, a cloud platform.

nazaruka commented 5 years ago

Attempted to run the Floydhub fix but couldn't get it to work; there were some issues with the way some of the code was set up in the Jupyter Notebook.

For instance, one line of code was retro.list_games(). list_games is not a method in retro's __init__.py file but is rather a method in a subdirectory of retro called data. Another line involved calling a method titled retro.get_game_path(), which appears neither in retro's __init__.py file nor in other such files in the retro package.

schrum2 commented 5 years ago

By following the instructions here: https://github.com/MaxKelsen/openai-sonic I actually was able to get a random agent to play Sonic, but not Sonic 1. I got the agent to play Sonic and Knuckles. Still, that's pretty cool progress. The next step is to try an actual learning algorithm.

schrum2 commented 5 years ago

Important reference points: https://arxiv.org/abs/1804.03720 https://openai.com/blog/first-retro-contest-retrospective/

schrum2 commented 5 years ago

We both have the Sonic code running now, so this issue is well under way. Here is what needs to be done in order to close out this issue.

1) Copy code from one of the top 3 Sonic competitors and/or the Gotta Learn Fast benchmarks into the repo. 2) Run one of those algorithms long enough to noticeably improve behavior. 3) Log information about what method you used, how long it took to run, and what performance you achieved in this GitHub thread. 4) If the performance isn't as good as it should be (according to the paper or the competition results, try to explain why (not trained long enough?)

Then close this issue.

schrum2 commented 5 years ago

Something else important to note about this issue. The default Sonic reward seems to be the game score, but that's almost worthless. However, when you go to this site, https://contest.openai.com/2018-1/details/ , and use their retro-contest repo to run Sonic instead, the reward is based on how much progress you make moving to the right of the level. I took their simple-agent, removed the docker communication, and committed it to the repo.

schrum2 commented 5 years ago

I feel that it is also worthwhile to include this link in the issue thread: https://openai.com/blog/first-retro-contest-retrospective/

nazaruka commented 5 years ago

3rd place PPO winner with meta-learner (trains based on given Sonic levels) and submitted fast-learner with docker https://github.com/aborghi/retro_contest_agent

schrum2 commented 5 years ago

We're able to run that agent, but who knows how long it will take ... will run overnight. However, we need to investigate ways of saving and loading the model so we can train across multiple sessions, and also train with rendering off for better speed (only makes sense if we can load a saved model later to actually watch it)

schrum2 commented 5 years ago

Tried to track the rewards/returns the agent was receiving, but this was confusing. The variables mb_rewards and rewards are lists/vectors with multiple values. We can't seem to get the step by step reward or even the one-episode return

schrum2 commented 5 years ago

Alex and I were both able to run the third place PPO agent over night and it eventually beat the first act of Green Hill Zone. Annoyingly, the logging doesn't seem to track useful information, such as how the rewards/return change over time, but at least the visualization shows us that we are beating that one level.

Need to try more levels.

nazaruka commented 5 years ago

Looking closer at the 3rd-place-winning code, I realized that it is almost one-for-one with the PPO2 Baselines implementation, with numerous methods and variables from the ppo2.py, model.py, and runner.py files in the ~\baselines\baselines\ppo2 sub-directory being virtually completely pasted into the ppo2ttifrutti.py file. This phenomenon would imply, of course, that FastLearner would be saving models and checkpoints in a manner that is similar to how the PPO2 Baselines implementation is. However, running PPO2 does not guarantee anything being saved, potentially a consequence of the relevant variable being set to None.

Manipulating the save_interval variable to some value n, initially set at 0, enabled me to load "checkpoints" every n updates. If we presume that there have been 100 updates, three files are created on the hundredth update:

  1. 00100_tf.data-00000-of-00001
  2. 00100_tf.index
  3. 00100_tf.meta

The frequency with which these files are created is determined by n. How would these "checkpoints" be relevant, though? One idea Dr. Schrum and I had was to look at the checkpoints.py file, which is periodically edited every time a new "checkpoint" occurs. At the hundredth update, the file's contents would look like:

model_checkpoint_path: "~\\checkpoints\\00100_tf"
all_model_checkpoint_paths: "~\\checkpoints\\00100_tf"

If a model_checkpoint_path points to a network with updated weights, would it make sense to load this path as a model?

schrum2 commented 5 years ago

Basically, PPO seems to work, but agents still have problems dealing with loops and backtracking. They do sometimes eventually get through these obstacles, but there is definitely room for improvement.

I think that takes care of the specific details relevant to this issue. I created a new issue for addressing the saving and loading of PPO models #17 , so all future work on that issue should be documented there.