Training on multiple levels

nazaruka commented 5 years ago

This issue will be two-fold:

Is it possible to run MetaLearner on one of the machines even if it were to load just two, and not all, of the levels?
If we were to run FastLearner on one level at a time, terminating after a set number of checkpoints, would its knowledge be more robust or merely more prone to forgetting?

nazaruka commented 5 years ago

Part 1: Running MetaLearner

At first glance

We examine MetaLearner's ppo2ttifrutti_agent.py file and notice that it declares the env variable as a SubprocVecEnv object that takes an array of several declared methods:

ppo2ttifrutti.learn(policy=policies.CnnPolicy,
                    env=SubprocVecEnv([env.make_train_0, env.make_train_1, ..., env.make_extra_39]),
                    ...)

As it turns out, the agent wasn't just trying to open the 47 levels assigned for model training in the competition at once, it was also attempting to open the 11 non-training levels and the 40 extra Sega Master System and Game Boy Advance levels. Unlike FastLearner's respective file, which possessed only a make_custom method to wrap environments, MetaLearner's ppo2ttifrutti_sonic_env.py bears the following:

make_custom (never called neither in this file nor in the agent file, potentially just a safety measure)
make_train - wrapper method for the 47 training levels
make_val - wrapper method for the 11 unused Genesis levels
make_extra - wrapper method for the 40 extra SMS/GBA levels
98 (!) helper methods to load every individual environment. These methods are defined by their environment's status as a make_val, make_train, or make_extra state, plus an underscore with the environment's index (e.g. Sonic 2's Wing Fortress Zone is instantiated through make_train_25)

Quick fixes

I copied the ppo2ttifrutti_sonic_env.py file into a new Python file titled ppo2ttifrutti_sonic_env_trunc.py. Within this new file, I removed make_val, make_extra, and any helper methods that were not of the make_train category. I followed up on a similar process to create a Python file titled ppo2ttifrutti_agent_trunc.py, which declared env as:

ppo2ttifrutti.learn(policy=policies.CnnPolicy,
                    env=SubprocVecEnv([env.make_train_5, env.make_train_18, ..., env.make_extra_43])

These method calls correspond to three particular environments, all of which contain loops and/or necessary backtracking:

make_train_5 = Starlight Zone Act 1 (from Sonic 1)
make_train_18 = Metropolis Zone Act 2 (from Sonic 2)
make_train_43 = Ice Cap Zone Act 2 (from Sonic and Knuckles 3)

Running the code

These fixes were more than enough to get the truncated agent of three environments to run (without rendering any of them, of course), though I did get a very long console error the first time running the agent. Essentially, once the first update was about to save as a checkpoint, the console printed out this line:

Saving to /tmp\checkpoints\00001

before printing several lines of errors. Immediately, I knew that something was wrong with the way the MetaLearner agent was declaring the save path. After carefully examining MetaLearner's ppo2ttifrutti.py file, I noticed a command logger.configure('/tmp') on Line 196. I commented this command out and ran the agent successfully, during which it was able to save checkpoints to the appropriate location (C:\Users\nazaruka\AppData\Local\Temp\openai-2019-06-10-11-24-03-398215\checkpoints).

I ended up running the agent for just over two hours, with save_interval set to 25 and other hyperparameters kept as they were. For my results, I took the standard FastLearner agent and set the environment to Sonic 1's Green Hill Zone Act 2 and load_path to 'C:/Users/nazaruka/AppData/Local/Temp/openai-2019-06-10-11-24-03-398215/checkpoints/00175', the final checkpoint saved before I terminated the truncated MetaLearner agent.

Results

Overall, the model generated from training on three contemporaneous environments for a little over two hours learned well enough to overcome the obstacle in yellow by backtracking just enough. However, it did not get past the loop in red (though there were several occasions in which it got close before jumping off).

Note that these were the same results as from when I ran a FastLearner agent on Green Hill Zone Act 1 for 17.5 hours and then applied its model to Green Hill Zone Act 2. This stark difference in learning time could be attributed to three potential factors:

Better robustness of knowledge
Better transferring (plausible; will be explored in Part 2)
Rendering (FastLearner rendered the environments, but MetaLearner did not)

nazaruka commented 5 years ago

Part 2: Running FastLearner in succession

Simultaneous vs. successive models

From Part 1, we have seen that a model generated from running three distinct learning environments through MetaLearner will perform just as well on Sonic 1's Green Hill Zone Act 2 as a FastLearner agent that trained on Green Hill Zone Act 2 for eight times as much. What if we were to use a model that trained on several levels in succession—that is, running an agent on one level and using the knowledge from its model to work in another level, and so forth?

Running the code

I did not need to fix any lines to run the FastLearner agent, but I did need to follow this process:

Open ppo2ttifrutti_agent.py and set save_interval to whatever positive integer you prefer.
Open FastLearner's ppo2ttifrutti_sonic_env.py file and set env to whatever level you will be running (e.g. env = make(game='SonicTheHedgehog-Genesis', state='StarLightZone.Act1') for the first level in our sequence). Save the file.
Go back to ppo2ttifrutti_agent.py. For the first level in the sequence, leave load_path as None. In the following cases, set load_path to the path of the final checkpoint from the previous level (don't forget to correct backslashes if you are using Windows). Save the file.
Run ppo2ttifrutti_agent.py, terminating it after n updates.
Repeat steps 2-4 for the next level, ensuring that every level will run for n updates.

I sought to replicate the results in Part 1 and set save_interval to 10, ensuring that there will be a checkpoint for every level's 60th update. Given that I had run MetaLearner for 175 updates and 3 does not go into 175, I resorted to running the successive model for a grand total of 60 * 3 = 180 updates. Overall, with the minimal time it took to edit the aforementioned files between runs, the successive model trained for about five hours.

Results

Unlike the model generated from running three learning environments simultaneously, the successive model could not bypass the obstacle in pink. In every episode, the agent took around fifteen seconds to run into the obstacle; the rest of the episode was spent frantically jumping about the bottom of the slope. As such, one may readily assume that the successive model did not transfer learning well, for it did not make any efforts to backtrack enough to make it over the slope.

Of course, the successive model still delivered more promising results than running a single FastLearner agent on a given level. Nevertheless, even when given around the same aggregate number of updates, it could not perform as well as the MetaLearner agent.

schrum2 commented 5 years ago

I think this issue is complete. Basically, we know how to train across multiple levels if/when we want to. Importantly, I think we might make use of SubprocVecEnv in the future, if we can get success with our other approaches in individual levels.

nazaruka / gym-http-api