Closed diegoferigo closed 5 years ago
An example of a algorithm that needs parallel workers is A3C, implemented in python with tensorflow.
In view of #13, I can identify few different use cases:
Option 1) can be achieved through a proper use of GymFactory::make()
, coupled with multi-world support of IgnitionEnvironment
. Option 2) is way more complicated and it will be discussed in its own issue.
[...] (anything else?)
The problem mentioned in https://bitbucket.org/ignitionrobotics/ign-gazebo/issues/18/support-for-multiple-worlds does not affect only multiple worlds being executed on the same server, but it would affect also multiple servers running in the same process, as the root cause are global variables used inside ODE, that is used as DART's collision detector. DART supports several collision detectors (also Bullet and FCL, if I remember correctly) but I think there are reasons why in ign-gazebo they are using ODE as collision detector.
Reading data from a simulated
<model>
is performed through a plugin that interfaces with a singleton. Then, environment plugins such as the CartPole access the data through the singleton. The model is registered with its name, and if two worlds are simulated, since they operate on the same sdf, they would try to register the same entry inside the singleton. If we want to keep this architecture, we should figure out a workaround to store and retrieve from the singleton different entries for different environment plugins.
For multiple world in one server, this can be solved using the scoped name of the model (that at least in Classic Gazebo contains the world name). For multiple servers in the same process, probably we just need to get one level deeper, and prefix the scoped name also with some kind of unique identifier of the server.
The problem mentioned in https://bitbucket.org/ignitionrobotics/ign-gazebo/issues/18/support-for-multiple-worlds does not affect only multiple worlds being executed on the same server, but it would affect also multiple servers running in the same process, as the root cause are global variables used inside ODE, that is used as DART's collision detector. DART supports several collision detectors (also Bullet and FCL, if I remember correctly) but I think there are reasons why in ign-gazebo they are using ODE as collision detector.
Thanks for the explanation, I didn't investigate too much in that issue. For reference, this is the discussion about the switch to ODE for collision detection.
Anyway, for the transparent case defined above in https://github.com/robotology/gym-ignition/issues/12#issuecomment-479935574, I would tend to prefer multiple worlds running in the same server rather than multiple servers running a single world in the same process (which, btw, was not an option I initially considered). Another reason is because rather than going towards the second option, I would switch to multiprocess since it would simplify horizontal scaling.
For multiple world in one server, this can be solved using the scoped name of the model (that at least in Classic Gazebo contains the world name).
I like this idea. I don't think there is something similar in ignition, at least I never saw it. However, IgnitionEnvironment
could parse the sdf and dynamically change the name of the world to create a new scope (the default is default
). This is what I already tried to do in the past in order to avoid the need to pass the name of the model controlled in an environment plugin (refer to these lines). Unfortunately this is not possible due to the separation of the sdf between world and model. There's no simple way to get the context of included models if not manually resolving the file (and I didn't want to invest time in it). Instead, since the name of the world is in the world file, changing it should be straightforward.
Anyway, for the transparent case defined above in #12 (comment), I would tend to prefer multiple worlds running in the same server rather than multiple servers running a single world in the same process (which, btw, was not an option I initially considered). Another reason is because rather than going towards the second option, I would switch to multiprocess since it would simplify horizontal scaling.
I thought a bit more about this, and the parallel implementation more straightforward is running multiple servers containing a single world.
I would like to provide a transparent usage which is compatible to a typical parallel algorithm implementation such as A3C. Parallel workers should be able to instantiate a new gym environment and act on it independently to the other ones. This means that (e.g. from python) we are implementing it with multiple threads within the same process (the python interpreter). I am working on it now and provide a draft PR soon. The modifications of the first post still apply.
Note that this outcome would be similar to running multiple worlds in a single gazebo server, but in this case the control of the threads would pass through gazebo and accessing them individually from the outside would be rather complicated.
At this early stage, only one concurrent simulation is currently possible. Though, many RL algorithms achieve their best performance only when many agents / workers are exploited.
We should think how to address this constraint.
From my point of view, for what concerns ignition gazebo, this can be can addressed in two ways:
gazebo::Server
, and each of them load the same sdf world.The first solution would have problems when the GUI is involved. Which one of the parallel worlds should be rendered? I suspect that if we have this option running, due to the current ignition architecture (messages exchanged between server and gui) the rendered world would be the combination of all the running instances.
For what the second solution, instead, I guess that upstream they will provide a way to synchronize the gui only on one of them. However, the multiple world support still has to be finalized.
Though, even if today we would have option 2) running, gym-ignition has some limitations that prevent concurrent simulations. Particularly:
<model>
is performed through a plugin that interfaces with a singleton. Then, environment plugins such as the CartPole access the data through the singleton. The model is registered with its name, and if two worlds are simulated, since they operate on the same sdf, they would try to register the same entry inside the singleton. If we want to keep this architecture, we should figure out a workaround to store and retrieve from the singleton different entries for different environment plugins.cc @traversaro