xbpeng / DeepMimic

Motion imitation with deep reinforcement learning.
https://xbpeng.github.io/projects/DeepMimic/index.html
MIT License
2.27k stars 484 forks source link

Multi agents interaction. #156

Open AGPX opened 3 years ago

AGPX commented 3 years ago

Hello,

congratulations, really a great job. I have a question: how hard is to have two opponents agents, in order to model an environment, for example, where an agent tries to get the soccer ball from his opponent (dribbling)? The ability to have 2 (or more) agents interacting can make this algorithm more suitable for a videogame.

AGPX commented 3 years ago

Maybe I made some progress (2 agents):

image

image

one issue I had to solve is that the two agents are placed in the same position and so they intersect. However, I'm not sure if the second agent is able to learn or not, I have to run it for more time (but I suspect not, probably because it's far from the Kinematic character). Anyway, I believe the real obstacles to support multi-agent is that almost all data must be decoupled like, for example, the KinCharacter & KinController for each agent (actually we have one for all the agents, am I right?). Also the motion files must be specified on per-agent basis (this can solve the initial intersect issue). Basically, they should be two completely separate instances with only a common environment to make agents able to interact with each other (I would like to do a scene where one agent goes in front of the other and punches him in the nose and maybe the other tries to avoid it :-)

AGPX commented 3 years ago

Improvements... now I've a kinematic character for each agent and with different motion files (walk on left, spin kick on right). Note the double curve in the graph.

image

I'm just curious to see if both agents learn... stay tuned...

AGPX commented 3 years ago

Mmhh.... looks like (after 5M of samples per agent) that Agent 1 (Spinkick, right agent) have some influences on Agent 0 (walk, left agent):

https://youtu.be/cbuAvkK0KG0

I'm missing something, it's unlikely that after 5M of samples the Agent 0 is so unable to walk.... still investigating...

xbpeng commented 3 years ago

Nice! Looks like you are making good progress on this. Unfortunately it will probably require pretty large changes to have the code support multiple characters. It will take a fair bit of work, but should be doable.

AGPX commented 3 years ago

Yes, I also believe it is doable (at least, to some extent on my part that I don't have a deep and complete understanding of your code). However, I made all my changes very quickly (in a couple of hours), but looking back at them I have already discovered some errors (read: lack of decoupling between agents) that most likely be the cause of bad training results. Stay tuned.

AGPX commented 3 years ago

Ok, it still doesn't work. I have solved many problems and now DeepMimicCore seems correct to me, but the training still does not work as it should. Now, my focus has shifted from C++ core to Python code. In particular, I think MPISolver (and MPIUtil) may be the problem because it most likely mixes the data from Agent 0 and Agent 1. I definitely need 2 separate channels.

AGPX commented 3 years ago

Ok,

here is the result after separating the MPI communication channels for the agents (25M of samples per agent):

https://youtu.be/m8ag-qKAasM

the walking agent always hits the other agent during training and, for this reason, it seems that learns to fall (before hitting the other agent...). I have to move it not only on x axis but also on z (actually we only have 'char_init_pos_xs' in the scene parameters, I have to add the equivalent for the z). Aside from that, it looks like training went well this time (obviously the training time is higher, probably doubled!)

AGPX commented 3 years ago

I've displaced the actors to avoid collisions, but the walking agent (after 15M of samples) still falls at some point:

https://youtu.be/KlV2qKU4U18

Perhaps this behavior is due to the fact that the environment is always reset for both agents (that is, when one agent goes down, the other also gets reset, even if it was going well and this is especially true when one agent has a goal that is more difficult to achieve than the other, like the Spinkick). Unfortunately, I don't think there's a work-around for this. Anyway, I try to keep the learning going (I believe 15M is not enough, especially for 2 agents), but I'm afraid Agent 0 won't recover easily at this point. @xbpeng any suggestion?

P.S.: I had an idea. I will try with Agent 0 (walking) untrained and Agent 1 fully trained, so that the latter cannot affect the former in terms of reset. If Agent 0 continues to fall, the problem is different.

xbpeng commented 3 years ago

The intermediate results look promising. 15M samples is not a lot, it usually takes 50-100m samples to learn a motion. So I think if you keep training this, and it should still get better.

You can probably modify the reset so that it only resets the agent that falls rather than both agents.

AGPX commented 3 years ago

You can probably modify the reset so that it only resets the agent that falls rather than both agents.

This looks very difficult to implement, especially since the reset usually apply to the world as well (which includes shared things, like forces) and I don't really know how to apply it to a single, isolated, agent. I'm starting to think that it would have been easier to have multiple cScene objects (rather than multiple actors), sharing the same cWorld and cGround instances (and also the perturbations), even if the problem of the separate reset remains. Thinking about it: Python code is completely agnostic from the concept of scenes and therefore requires no modification. The biggest thing to add is a mechanism to pass all the scenes to the reward functions. Maybe it will be the next attempt, if the current fails.