Evolve in multiple levels

This is actually two distinct options.

Option 1: --alternate-levels If this option is on, then every generation should evaluate Sonic in a distinct level randomly chosen from all options. This prevents overfitting to a single level and encourages general behavior (hopefully) without costing extra evaluation time.

Option 2: --multiple-levels If this option is on, then every evaluation is actually an evaluation across multiple levels (perhaps using Joint PPO?). I think the vector environment can support this, but am not sure. This would certainly make the fitness evaluation more robust and may help the learning, but it probably too costly to be realistic. There is also a question of which levels to use ... we can't evaluate in all of them every evaluation. Perhaps we need an extra parameter --num-levels, and if --multiple-levels is true then args.num_levels number of levels will be randomly chosen at the start of evolution to be re-used over and over in each generation.

nazaruka / gym-http-api

Evolve in multiple levels #37