Integration with MADDPG

kumarict commented 5 years ago

I'm trying to integrate Neural MMO with MADDPG (https://github.com/openai/maddpg ; https://github.com/openai/multiagent-particle-envs ) and am facing some issues related to this:

Native Route
- When running Neural MMO from the command line, this does not appear to work when the "-- render" argument is omitted (i.e., just "python Forge.py" is called). It stops after the "Hello from the pygame community" line.
- Is there even a way to do this using Native? I understand for MADDPG, a situation like VecEnv is needed where actions, observations, reward, and world are all compartmentalized. If I decompartmentalize some of them, such as separating out the super() call from NativeRealm from the non-super part, it may work. It appears in general the paradigm is different here, however, I have made some other modifications and they work in Native, so this would be great if it could be used for MADDPG integration.
VecEnv/Gym Route
- When running Neural MMO with vecenv ("python Forge.py --api vecenv --render") and launching the browser graphical display, no agents show up. I understand this works slower than Native, so is it the case that if I wait long enough / get a more powerful machine, then it will work?
- When running "python Forge.py --api vecenv", it hangs at the "Hello from the pygame community" part just like the Native version and then gives an error. I'll include the error at the end of this message.
- This seems like it could interface with MADDPG more easily, however, because of the issues above, I can't do much with it.
In general, what is the best practice for integrating with MADDPG in your view? Has anyone integrated Neural MMO with, say, PPO or DPG or something similar? I am somewhat new to this field, so apologies for any ignorance and thanks in advance for your patience!

Error encountered when running vecenv with no rendering

testlaw128 , NENT: 128 , NPOP: 8 testchaos128 , NENT: 128 , NPOP: 8 sample , NENT: 128 , NPOP: 8 2019-05-10 18:03:39,816 INFO node.py:423 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-05-10_18-03-39_2786/logs. 2019-05-10 18:03:40,032 INFO services.py:363 -- Waiting for redis server at 127.0.0.1:64431 to respond... 2019-05-10 18:03:40,257 INFO services.py:363 -- Waiting for redis server at 127.0.0.1:28843 to respond... 2019-05-10 18:03:40,274 INFO services.py:760 -- Starting Redis shard with 6.86 GB max memory. 2019-05-10 18:03:40,455 INFO services.py:1384 -- Starting the Plasma object store with 10.29 GB memory using /dev/shm. (pid=2818) pygame 1.9.5 (pid=2818) Hello from the pygame community. https://www.pygame.org/contribute.html Traceback (most recent call last): File "Forge.py", line 74, in example.run() File "Forge.py", line 48, in run self.step() File "Forge.py", line 44, in step self.envsObs, rews, dones, infos = self.env.step(actions) File "/home/kumar/Projects/neural-mmo/forge/trinity/smith.py", line 70, in step return self.env.step(actions) File "/home/kumar/Projects/neural-mmo/forge/trinity/smith.py", line 23, in step zip(self.envs, actions)]) File "/home/kumar/.local/lib/python3.6/site-packages/ray/worker.py", line 2307, in get raise value ray.exceptions.RayTaskError: ray_VecEnvRealm:step() (pid=2818, host=dvrs-10) File "/home/kumar/Projects/neural-mmo/forge/blade/core/realm.py", line 176, in step return pickle.dumps((stims, rews, None, None)) File "/home/kumar/Projects/neural-mmo/forge/blade/entity/player.py", line 68, in getattribute return super().getattribute(name) RecursionError: maximum recursion depth exceeded while calling a Python object

kumarict commented 5 years ago

When running Neural MMO from the command line, this does not appear to work when the "-- render" argument is omitted (i.e., just "python Forge.py" is called). It stops after the "Hello from the pygame community" line.

Okay, this is working fine. It just does not display any messages. I added a print statement for every time an entity is spawned and it shows up whether or not --render is applied.

jsuarez5341 commented 5 years ago

Hi! I've been a bit slow on responses for the past couple of weeks -- I'm actually in the middle of a pretty heavy dev phase on the first major patch. It will take a while, but it should make a bunch of the stuff you mentioned a lot easier once it's done. The gist of the issue is that the VecEnv/Gym computation model doesn't really work in this setting -- not only is there too much communication, but the input/output processing on the game state is a bit more complicated than in most RL environment is isn't exactly easy to package into tensors.

The Native model is somewhere in-between Gym and Rapid (the OpenAI infra for DotA -- you can read more about it on their blog). The idea is to have persistent workers, one per CPU core, which each have access to a copy of the environment. The Native API currently does this synchronously -- that is, you make decisions for one agent at a time, The VecEnv API is asynchronous and lets you make all decisions at once (pretty much the same as any other RL environment). This should allow you to centralize decision making across all the agents on that core, if you so desire. The VecEnv example in the repo provides inference code -- you can structure training however you please.

The upcoming patch will be merging the Native and VecEnv APIs in favor of the latter, with the infra on top adopting a variant of the Rapid computation model. Ideally, this should make a bunch more approaches feasible, including centralized training.

A few pointers for the meanwhile if you'd like to hack the VecEnv API. The config and experiments file specify run parameters. You weren't getting any output because default behavior is to load the model and run in evaluation mode -- the print statements only log epoch information, and in test mode, there are none. Feel free to reopen if you run into more issues :)

openai / neural-mmo

Integration with MADDPG #27