neuroevolution-ai / NeuroEvolution-CTRNN_new

MIT License
3 stars 3 forks source link

Training on VizDoom #32

Open bjuergens opened 4 years ago

bjuergens commented 4 years ago

If VizDoom doesn't come as an openAI Gym, we may need to create a new episode runner for this

todo:

optional follow up tasks (create separate issues when needed)

possible alternative:

bjuergens commented 4 years ago

for the record: relevant discussion on slack happened on 2020-11-18 and 19

bjuergens commented 4 years ago

resources:

additional info

DanielZim commented 3 years ago

Training agents on VizDoom is very interesting in the long run I guess. However, I think we first should master the Procgen environments first before we go over to VizDoom. I also think, that integrating VizDoom in our framework requires some effort that should not be underestimated. In particular, we currently do not support multi-agent learning, since the openai gym envs do not support that. Further, dealing with the networking multi player in VizDoom is another issue (currently VizDoom uses UDP/P2P) which is a performance bottleneck. Since we have a lot of procgen environments left that we can use out of the box, I would exclude VizDoom and multi-agent learning (from my dissertation), but I really would like to see our agents training on this in the future.

By the way, there even seems to be a Rust source port of Doom: https://github.com/cristicbz/rust-doom. Since Rust is quite promising to develop new envs, maybe this port might also be interesting in the future (although I guess plenty of work has to be done to integrate this in our platform).

bjuergens commented 3 years ago

I also think, that integrating VizDoom in our framework requires some effort that should not be underestimated. In particular, we currently do not support multi-agent learning, since the openai gym envs do not support that

I think this can be implemented with minimal changes to the existing architecture.

We just need to pass a list of individuals to the eprunner instead of just a single individual and the eprunner returns a list of fitness, instead of just one and where pretty much done. (at least from an architectural point of view)

From that point on, we "just" need to implement an optimizer, that benefits from multi-agents. We could modify the existing optimizers easy enough for this. We just add a parameter like "group_size" and "in_how_many_groups_will_each_individual_participate?" (which both default to 1), and then the population is split into groups, and the groups are evaluated together. And the finesses are then processed as they are now.

on a side note: MU_ES has a parameter "tournament selection" which emulates multi-agent training on single-agent finesses.

bjuergens commented 3 years ago

maybe this port might also be interesting in the future (although I guess plenty of work has to be done to integrate this in our platform).

if it's wrapped in a gym with python-bindings, then integrating it in our system would be as complicated as integrating procgen, i.e. it would be very simple

DanielZim commented 3 years ago

maybe this port might also be interesting in the future (although I guess plenty of work has to be done to integrate this in our platform).

if it's wrapped in a gym with python-bindings, then integrating it in our system would be as complicated as integrating procgen, i.e. it would be very simple

Ok, I meant wrapping this into a gym env might be complicated (and preparing this for a ml experiment at all, like synchronize it to max speed, not 35 fps, handling multiplayer properly etc.), the integration into our platform afterwards should be no problem.

bjuergens commented 3 years ago

ah, ok.

i think we could force it into a gym packet somehow, but I think it would be better if we didn't use the gym-interface for this, and instead just defined a special episode runner for this. Only when we add different multi-agent gyms, we could start thinking about adding an interface for multi-agent gyms.

We could force multi-agent stuff into the gym-interface, by multiplying the input/output spaces, so that the step-method takes steps from all agents and returns observations and rewards for each agent. But I don't think there would be any benefit in doing so. Others won't be able to use this new gym like they can use the older gym. And our own code would have to have special exceptions for this gym in any case