Controlling emulator speed

gabegrand commented 7 years ago

Hi Philip, I was wondering whether it's possible to manually set the emulator speed. It'd be nice to further increase the speed, say to 5000%, during training. Additionally, when demoing the RL agent, it'd be great to set it to normal speed, so that the sound works properly.

On a related note, in the interest of increasing training efficiency, is there any way to skip the lengthy intro screens (e.g., "World 1-1") that occur prior to the level? Currently, it takes about 20 mins to do 100 training iterations, which is pretty slow, since our goal is to train for several thousand iterations.

ppaquette commented 7 years ago

3200% seems to be the maximum allowed value by fceux.
Not sure how to skip the intro screens faster, I'm starting fceux, then forcing a value in the world and level memory addresses, skipping frames until the timer starts to decrease, then I'm establishing the pipe with python.

Maximum efficiency could be achieved by coding directly in fceux (lua), but that would make it incompatible with gym / python.

Another option is probably to run iterations in parallel and update weights on a central server.

gabegrand commented 7 years ago

Hi Philip, we're still having some trouble achieving enough training iterations in a reasonable amount of time. The RL methods we're using are pretty standard (Q-learning, SARSA, approximate Q learning), and they would be difficult to parallelize, since they require iterative updates that depend on previous calculations done in serial. The training time issue is very big for us, since we need to test out several different algorithm variations and hyperparameter configurations in order to write our final paper for our course at Harvard.

If the emulator speed is already maxed out, we should look into ways we can decrease the amount of time spent on the intro screens. Is there any way to skip all frames before the timer starts? Another approach to consider would be to keep the emulator open for the entire duration of the training, and manually reset the number of lives to 3x after every life. That way, you would only have to establish the pipe with python once during the whole training sequence. What do you think?

ppaquette commented 7 years ago

I'll try to see if I can skip the intro by saving the memory state.

What kind of % improvement do you need vs the current speed? Should I only optimize the tiles version?

gabegrand commented 7 years ago

Currently, it takes approx. 4500s = 75 mins to train 100 iterations on World 1-3. That particular level has a cliff right at the beginning, so Mario usually dies very quickly, which means that the training speed we achieved of 45s / iteration on that level is probably a best case scenario. In order to make it serviceable, we'd ideally like to see a 10x increase in training speed, which would allow us to get close to 1000 iterations per hour. We would need that kind of speed in order to test out different combinations of hyperparameters of our model.

We're only using the tiles version, so from our perspective, it's fine if you'd like to focus on optimizing that. Thank you again for your efforts.

ppaquette commented 7 years ago

I should have something ready by Tuesday or Wednesday.

ppaquette commented 7 years ago

OpenAI released 'Universe' today, a way to convert any game to a gym env through a docker container (communication is done through VNC).

I'll do a quick patch for you, but I'll probably need to make this env compatible with Universe in the future

Universe also has a A3C (asynchronous advantage actor-critic) learning algo available that can be run across a cluster. (see https://github.com/openai/universe-starter-agent).

ppaquette commented 7 years ago

Pushed the fix to the 'gabegrand' branch. Mario is on steroid.

The info var now returns an 'iteration' key, that is increased when the level is restarted.
You don't need to call reset(), except to first initialize the env
To check if Mario has completed the level, check the value of the distance key when the iteration key is increased. The flag pole is 40 'meters' before the castle. The castle distance are here. (e.g. if Mario reaches 2474 (2514 - 40) in level 1-3, he successfully completed the level).

Here is a quick python script that works for me:

import gym
import ppaquette_gym_super_mario

env = gym.make('ppaquette/SuperMarioBros-1-3-Tiles-v0')
env.reset()

curr_iter = 1;
max_iter = 2;
while curr_iter <= max_iter:
    action = env.action_space.sample()
    obs, rew, done, info = env.step(action)
    if (info['iteration'] > curr_iter):
      print('Max Distance Achieved', info['distance'])
      curr_iter = info['iteration'];

env.close()

gabegrand commented 7 years ago

Hi Philip, thanks for the fix. I see that the number of lives now starts at 9x, and that the info var / iteration key is behaving as expected. However, I'm still not really seeing an increase in the game speed - it seems to be running at roughly the same speed as before. Are you seeing significant speedup in the framerate on your end?

ppaquette commented 7 years ago

I just ran 100 episodes (random actions) on level 1-3, and it took 391.89 seconds (so ~ 900 episodes / hour).

Output log

Try running it in a cloud VM and compare it to my benchmark using random actions.

The game saves an initial state when first loaded, and reloads that state when Mario dies (much faster then killing and restarting fceux at every iteration), which should give roughly 2x increase
The game repeats every action for 6 frames (1 processed, 6 repeated - Used to be 1 processed, 1 repeated), which should give roughly 5-6x increase.

gabegrand commented 7 years ago

On World 1-3, running on my machine, the script you provided took 910.974s for 100 episodes. That's not quite up to what you recorded, but there is definitely some speedup from the previous version. Also, we no longer have to close and re-open the emulator every time, which is nice.

I have a couple questions / comments about the new code:

Previously, we had written our code to duplicate actions for a certain number of frames (otherwise, Mario's behavior is too frantic/jumpy since he is constantly taking actions). However, you mentioned that the game now repeats every action for 6 frames. I'm wondering whether we should now remove this behavior from our code, to avoid repeating actions for too many frames?
Does the done variable in obs, rew, done, info = env.step(action) ever return True? Or do we need to just replace all done conditions with if (info['iteration'] > curr_iter)?
env.close() seems to be not working. The emulator just beachballs and doesn't close.

ppaquette commented 7 years ago

1) Yes you should remove the skip actions from your code. If you want to adjust the value, just edit this line: https://github.com/ppaquette/gym-super-mario/blob/gabegrand/ppaquette_gym_super_mario/lua/super-mario-bros.lua#L48 2) done will always be false, since reset() doesn't need to be called. You need to replace all done with info['iteration'] > curr_iter 3) Just kill the fceux process, or press ctrl c and close it manually

ppaquette / gym-super-mario

Controlling emulator speed #4