Closed janblumenkamp closed 1 year ago
To add to this, as another working example, this is the project/repository which is the result of this thread from me.
As a working minimal example with a more recent Ray version, I have created this repository. It's a toy problem that serves as a reference implementation for the changes that are due to be done in RLlib. I talked to Sven recently and the plan is to hopefully get this done over the next few weeks :)
EDIT: Just an update regarding my minimal example, it now supports both continuous and discrete action spaces and I have cleaned up the trainer implementation quite a bit, should be much clearer now. Let me know if you have any questions.
Hi @ericl @janblumenkamp. This whole thread was very helpful, thanks for the detailed explanations from both of you!
I am currently in the process of migrating a project to the RLlib framework, and I had some doubts about some of the points in your discussion. Here's some context before I begin:
My doubts revolve around the Agent grouping mechanism
or is the grouped super-agent literally treated as one big agent with a huge observation and action space
It's the latter, it really is one big super-agent. You could potentially still do an architectural decomposition within the super agent model though (i.e., to emulate certain multi-agent architectures).
First, decomposing the actions and observations of a single monolithic agent into multiple simpler agents not only reduces the dimensionality of agent inputs and outputs, but also effectively increases the amount of training data generated per step of the environment.
Thank you so much again! I can't wait to onboard to RLlib!
Hi @Rohanjames1997! Have a look at the discussion further down in this thread. You can't use the MultiAgentEnv and also grouping does not help if you want to run backpropagation through communication. Check out my minimal example: https://github.com/janblumenkamp/rllib_multi_agent_demo It involves many ugly hacks (most notably, formulating the multi-agent env as one standard gym super-observation and super-action space that contains the observations and actions for a fixed number of agents - in your case, maybe you can just mask the agents you don't need out - and also passing rewards for each agent through the info dict to the trainer). Will update it soon to Ray 1.3.0!
Hi @janblumenkamp ! Thank you so much for the link! I had missed that part of the discussion. I shall probably implement something very similar.
Assuming I had no inter-agent communication, could you answer my previous questions?
And an additional one: Since #10884 is still in progress, is it right to say that RLlib's MultiAgentEnv class currently does not support Graph neural networks? (Since GNN's involve communications by default)
Thanks again! And congratulations on the paper! It was a great read! 😄
What is your question?
My goal is to learn a single policy that is deployed to multiple agents (i.e. all agents learn the same policy, but are able to communicate with each other through a shared neural network). RLlib's multi-agent interface works with the dict indicating an action for each individual agent.
It is not entirely clear to me how my custom model is supposed to obtain the current state after the last time-step for all agents at once (it appears to me that RLLib calls the
forward
-function in my subclass inherited fromTorchModelV2
for each agent individually and passes the state for each agent into thestate
argument of theforward
function).tl;dr, if this is my custom model:
Then how do I manage to predict the logits for all of my n agents at once while having access to the current state of all my agents? Am I supposed to use variable-sharing? Is #4748 describing this exact problem? If so, is there any progress?