ppaquette / gym-super-mario

Gym - 32 levels of original Super Mario Bros
284 stars 83 forks source link

Controlling emulator speed #4

Closed gabegrand closed 7 years ago

gabegrand commented 7 years ago

Hi Philip, I was wondering whether it's possible to manually set the emulator speed. It'd be nice to further increase the speed, say to 5000%, during training. Additionally, when demoing the RL agent, it'd be great to set it to normal speed, so that the sound works properly.

On a related note, in the interest of increasing training efficiency, is there any way to skip the lengthy intro screens (e.g., "World 1-1") that occur prior to the level? Currently, it takes about 20 mins to do 100 training iterations, which is pretty slow, since our goal is to train for several thousand iterations.

ppaquette commented 7 years ago

Maximum efficiency could be achieved by coding directly in fceux (lua), but that would make it incompatible with gym / python.

Another option is probably to run iterations in parallel and update weights on a central server.

gabegrand commented 7 years ago

Hi Philip, we're still having some trouble achieving enough training iterations in a reasonable amount of time. The RL methods we're using are pretty standard (Q-learning, SARSA, approximate Q learning), and they would be difficult to parallelize, since they require iterative updates that depend on previous calculations done in serial. The training time issue is very big for us, since we need to test out several different algorithm variations and hyperparameter configurations in order to write our final paper for our course at Harvard.

If the emulator speed is already maxed out, we should look into ways we can decrease the amount of time spent on the intro screens. Is there any way to skip all frames before the timer starts? Another approach to consider would be to keep the emulator open for the entire duration of the training, and manually reset the number of lives to 3x after every life. That way, you would only have to establish the pipe with python once during the whole training sequence. What do you think?

ppaquette commented 7 years ago

I'll try to see if I can skip the intro by saving the memory state.

What kind of % improvement do you need vs the current speed? Should I only optimize the tiles version?

gabegrand commented 7 years ago

Currently, it takes approx. 4500s = 75 mins to train 100 iterations on World 1-3. That particular level has a cliff right at the beginning, so Mario usually dies very quickly, which means that the training speed we achieved of 45s / iteration on that level is probably a best case scenario. In order to make it serviceable, we'd ideally like to see a 10x increase in training speed, which would allow us to get close to 1000 iterations per hour. We would need that kind of speed in order to test out different combinations of hyperparameters of our model.

We're only using the tiles version, so from our perspective, it's fine if you'd like to focus on optimizing that. Thank you again for your efforts.

ppaquette commented 7 years ago

I should have something ready by Tuesday or Wednesday.

ppaquette commented 7 years ago

OpenAI released 'Universe' today, a way to convert any game to a gym env through a docker container (communication is done through VNC).

I'll do a quick patch for you, but I'll probably need to make this env compatible with Universe in the future

Universe also has a A3C (asynchronous advantage actor-critic) learning algo available that can be run across a cluster. (see https://github.com/openai/universe-starter-agent).

ppaquette commented 7 years ago

Pushed the fix to the 'gabegrand' branch. Mario is on steroid.

Here is a quick python script that works for me:

import gym
import ppaquette_gym_super_mario

env = gym.make('ppaquette/SuperMarioBros-1-3-Tiles-v0')
env.reset()

curr_iter = 1;
max_iter = 2;
while curr_iter <= max_iter:
    action = env.action_space.sample()
    obs, rew, done, info = env.step(action)
    if (info['iteration'] > curr_iter):
      print('Max Distance Achieved', info['distance'])
      curr_iter = info['iteration'];

env.close()
gabegrand commented 7 years ago

Hi Philip, thanks for the fix. I see that the number of lives now starts at 9x, and that the info var / iteration key is behaving as expected. However, I'm still not really seeing an increase in the game speed - it seems to be running at roughly the same speed as before. Are you seeing significant speedup in the framerate on your end?

ppaquette commented 7 years ago

I just ran 100 episodes (random actions) on level 1-3, and it took 391.89 seconds (so ~ 900 episodes / hour).

Try running it in a cloud VM and compare it to my benchmark using random actions.

gabegrand commented 7 years ago

On World 1-3, running on my machine, the script you provided took 910.974s for 100 episodes. That's not quite up to what you recorded, but there is definitely some speedup from the previous version. Also, we no longer have to close and re-open the emulator every time, which is nice.

I have a couple questions / comments about the new code:

ppaquette commented 7 years ago

1) Yes you should remove the skip actions from your code. If you want to adjust the value, just edit this line: https://github.com/ppaquette/gym-super-mario/blob/gabegrand/ppaquette_gym_super_mario/lua/super-mario-bros.lua#L48 2) done will always be false, since reset() doesn't need to be called. You need to replace all done with info['iteration'] > curr_iter 3) Just kill the fceux process, or press ctrl c and close it manually