Save/Load PPO Models - Githubissues

schrum2 commented 5 years ago

Some of these details are already discussed in #15 but I wanted this to be a separate issue. Basically, we need to be able to both save and load the models trained to play Sonic, though saving/loading models in general should be explored as well.

nazaruka commented 5 years ago

As it turns out, Alexandre Borghi split the information between both files in such a way that FastLearner and MetaLearner both have an agent file and an algorithm/network creation file. The agent is instantiated by importing the creation file and calling its learn(~) method, which takes around fifteen parameters. Two of these parameters, save_interval and load_path, are relevant to saving and loading models for further testing.

To reiterate my comment on #15, save_interval represents the modulus for which "checkpoints" are saved. A "checkpoint," in this case, is nothing more than a model saved after a given number of updates. First, set save_interval to some large positive integer. Then, run ppo2ttifrutti_agent.py and observe the following line of code appear on your prompt after the first update:

Saving to ~\AppData\Local\Temp\openai-YYYY-MM-DD-mm-ss-######\checkpoints\00001

\checkpoints\ represents the directory where the "checkpoints" are being saved, with its parent directory listing the date and time you executed the FastLearner agent (you may ignore the string of six integers). When you feel as though you have a "checkpoint" trained well enough to load, take its number, append it to the address for the \checkpoints\ directory, and assign that address to load_path in the FastLearner agent's ppo2ttifrutti.learn(~) method call. Below is how you would set load_path if you wanted to run the model gathered at the thousandth update.

ppo2ttifrutti.learn(...,
                    load_path=~/AppData/Local/Temp/openai-YYYY-MM-DD-mm-ss-######/checkpoints/01000)

Note that you will have to correct the backslashes to forward slashes; this issue is merely a consequence of our running the code on Windows, not of the code itself.

Addendum: Glancing over the MetaLearner files, we see that its agent sets save_interval to 25 but does not declare a value for load_path. It would make sense that MetaLearner is saving a model that will eventually be "learned" by FastLearner, for Borghi had to submit a FastLearner that was already pre-trained on various Sonic levels to the competition. FastLearner initially did not save models at all; it did not need to given the fact that it was to run OpenAI's random level only once. However, we are doing away with MetaLearner for now and instead training one level at a time with FastLearner.

nazaruka commented 5 years ago

After running a FastLearner agent on Sonic 1's notoriously difficult Labyrinth Zone Act 1 for about six hours, I ended up with a model that managed to traverse at maximum half the map. I took the path of the final checkpoint (C:/Users/nazaruka/AppData/Local/Temp/openai-2019-06-07-10-06-49-569772/checkpoints/00300) and set the FastLearner agent's load_path to it. Running the modified FastLearner agent got me a result that performed identically to the final checkpoint, which proved that the model saved and was able to load successfully.

I think it's safe to say that we can close this issue. Hopefully I can find a similar system within the other algorithms so that I may be able to emulate such a system in my own code.

nazaruka / gym-http-api

Save/Load PPO Models #17