tkn-tub / veins-gym

Reinforcement Learning-based VANET simulations
https://www2.tkn.tu-berlin.de/software/veins-gym/
GNU General Public License v2.0
53 stars 8 forks source link

veins-gym for multiple processes/ agents #5

Closed lionyouko closed 2 years ago

lionyouko commented 2 years ago

Hi, In order to use veins-gym framework with federated learning, I was thinking about how to make it run with multiple instances of openAI environments. I going to start to work on this but, since it is not the major part of my thesis, I may try to find the easiest working solution for now.

What I need is to have different, separated working instances of agents, and therefore, it seems that I need different GymConnection modules working in parallel each of them connected to a different port.

I was also thinking how I could separate the configuration part from the learning part itself, as, for now, the tool receives a certain message that means configuration.

If you have any suggestion regarding this, I will be very glad to think about how I could make it. Thank you very much.

lionyouko commented 2 years ago

Yes, the most important aspect is:

I not only need multiple veinsgym agents, but all of them linked to a specific running simulation (specific process).

But the reset function could kill the simulation process and let the other agents to starve.

I need to find a way that before reseting, all agents are actually done with their parts. For example, after an episode to finish, before to start another simulation process, the agent waits the other agents to be done.

It is important to be able to communicate with the current process that has sockets open with zmq in certain ports. That is easier to do. Each gymConnection will open a socket in 127.0.0.1: in which the port may be given by 5000 + number given omnetpp.ini for that gymConnection or its position in a vector of connectors, for example. Since I am thinking into having another program python process launching these agents, it is okay.

If you have any suggestion, please, tell me. Thank you.

dbuse commented 2 years ago

Hi @lionyouko

to clarify on your requirements: You want multiple agents connected to the same simulation and thus the same environment. Did I get that right?

If so, I would suggest leaving the GymConnection and its equivalent in the VeinsGym Python code as-is and only changing the parts that interact with them on the outside. On the Python/Agent side, this could be done by wrapping the environment object in a class that is able to talk to a collection of proxy environments. Each agent would get its proxy environment, which in turn communicates with the wrapper environment. The proxies would behave like normal environments to their agents, delivering observations and rewards specific to the agent to them and collecting actions from their agent. The wrapper would receive observations and rewards for all agents and distribute them to the proxies, while also collecting actions from all agents (through the proxies) and feed them to the simulation through the original VeinsGym environment object. So the synchronization (if needed) between agents could be implemented in Python. This could lead to some blocking time, but only where semantically necessary. If necessary, this could even be distributed across a network.

On the Simulation/C++ side, you would need to distribute actions from the GymConnection to the Vehicles/Applications/Modules which perform the actions selected by the agents.

I hope this helps.

lionyouko commented 2 years ago

Hi, @dbuse

Yes, the same simulation, but as each agent would be independent, does it need to be the same environment?

Let me try to explain what I understood by what you said, please, so you can tell me if I get it right:

In the way you proposed, the GymConnection module will be only one, and each RSU must implement a way to inform the other side that it is the n-th RSU with its n-th, i-th desire to receive a next action to be done. On the agent side, actually there will be only one wrapping env or, indirectly, agent that will get this info from RSU, to look the value of n and to deliver the i-th value of the environment in order to give the next (i+1)-th action. So they are all using the same channel, GymConnection, but on python side I am getting the modified message, so it contains to which agent I need to send (therefore to which proxy env needs to receive this message).

So I would need to create a master-env and proxy-env classes. And, because I am not very good with protobuf, the proxy env has a modified serialization that gets only the remaining of the message without the n of n-th RSU.

On the side of the RSUs, I would need to check if the action that came is for it by checking a modified action containing the RSU to which it would be addressed.

That would make the entire thing to be ran in only one process, the master-env.

My issue with this is the following: which type of waiting should I implement in my RSU as they are asking for actions asynchronously and without acknowledging each other? They are independent. A RSU may receive m messages before receiving the one that it asked.

Below I will try to explain what I was planning to do:

It is just a bit like you have proposed, but I was instead thinking about to launch n python processes by a master process, in which each of them would be an env (or agent), thus each of them would need a gymConnection counterpart. To sort out which time to reset, the proxies would inform the master process they received a shutdown message and the master would put it in a data structure just meaning "this one has finished". When the data structure was filled with n shutdowns, those processes should be done with their epoch, i.e., doing nothing in some way, so the simulation process could be killed by the master, and the reset in each agent/env could happen entirely aside of starting new simulation process. Since what they need is just the gymConnection sockets, they hopefully would find them there. This reset would need to be communicated, tho, so again, it would again go back to how to wait the right moment.

Yeah, it is a shame, I know, I only recall busy waiting, that is not good stuff really. Mine seems an over complication, apparently, compared to yours.

Please tell me if I got it right from your part, and what you would suggest to make the way to wait correct.

I am sorry for bothering so much, my idea is that if I make it to change the tool, I can share here too. Ultimately, I will share my entire code when I finish the thesis (If I finish it, because I unfortunately miss matched my abilities and responsibilities xD).

dbuse commented 2 years ago

Your approach would be possible as well. Though I think the coordination among the many running instances would be complicated, especially regarding start, reset, and shutdown. That's why I suggested the idea of keeping the communication and start/stop/reset logic as-is.

I think you understood what I was trying to tell. However, I was assuming a somewhat more synchronized communication scheme, i.e., multiple RSUs asking for actions at the same time. But from your newest post I understand that these will be independent. If performance and overhead are not too much of a concern, I assume my proposal with the master and proxy Env should be easy to prototype. Just design your observation and action spaces for a single RSU and then add a simple RSU identifier, e.g., an int, to the data. Then the communication pattern would be (assuming a running simulation):

  1. some RSU needs an action. It collects data and sends a request with its ID and observation through the GymConnection.
  2. the GymConnection just forwards the request and waits for the reply (no change here)
  3. the original environment receives and un-serializes the message and returns the observation which also contains the ID(no change here).
  4. the new wrapper (this can just be some code in the python main loop) receives the observation (containing the id and data) from the original environment. It looks a the id to fetch the matching agent and queries it for an action.
  5. the agent performs its policy and returns an action (no change here)
  6. the new wrapper add the agent's ID back to the action and sends it back to the original environment
  7. the original environment serialized the action (including the ID) and sends it back to the simulation (no change here)
  8. the GymConnection receives and un-serializes the action and returns it to the calling agent as a reply
  9. the RSU receives the action and the id (which it can check to be sure the action was meant for it) and executes the action.
  10. the simulation continues until the next RSU needs an action and the cycle starts again with step 1.

This does block the simulation on every action request. And it breaks the direct interface of a gym, so using tools like replay buffers may require a little more work. But it should be easy to set up. Just prepare an array of agents equal to the number of RSUs in the simulation. Maybe there is even support for this in ML/RL libraries (picking a certain agent/policy by some ID). If you want to keep the gym interface, you could also wrap the above procedure in the step function of a new gym class. That's what I hinted at in my previous post.

Best of Luck!

lionyouko commented 2 years ago

You very kind, Dr. Dbuse. I am taking your ideas in consideration. Thank You very much

dbuse commented 2 years ago

Great. I'll close this issue for now.