stefanbschneider / mobile-env

An open, minimalist Gymnasium environment for autonomous coordination in wireless mobile networks.
https://mobile-env.readthedocs.io
MIT License
84 stars 24 forks source link

Action space definition #37

Closed fyouly closed 12 months ago

fyouly commented 12 months ago

Hi, I am reading your projects and want to ask about your actions definitions of the agents and your action space size. I found that you are using MultiDiscrete for action space in your environment, can I customize it to another size such as Discrete and what are the parameters of each dimension in the vector of the MultiDiscrete action spaces stands for?

BR

stefanbschneider commented 12 months ago

Hi @fyouly ,

I built mobile-env mostly for training DeepCoMP (and its variants), which is a deep RL approach for multi-cell selection in mobile networks. I.e., the problem is to decide which users to serve by how many and which cells over time. To do that, I designed the actions such that each user can select one cell per time step or a no-op. It then connects to the selected cell; or disconnects from it if it was already connected.

This means, if there are M users and N cells, the action space is M x (N+1) for the centralized DeepCoMP approach (+1 for the no-op). This is where I use the MultiDiscrete action space - one discrete action for each user: https://github.com/stefanbschneider/mobile-env/blob/main/mobile_env/handlers/central.py#L20 For the multi-agent approaches (mine are called DD-CoMP and D3-CoMP), each agent has a Discrete action space of size N+1: https://github.com/stefanbschneider/mobile-env/blob/main/mobile_env/handlers/multi_agent.py#L26

More details are in the paper: https://ris.uni-paderborn.de/download/33854/33855/preprint.pdf


You can of course customize the environment and also the action space to your needs. It mostly depends on what you want to do.

Does that answer help you?

fyouly commented 12 months ago

Hi @stefanbschneider Thank you very much for getting back to me so quickly. So for your test.ipynb example with a small scenario for 5 users and 3 cells, when I print the random_action and get a printout value with [0 2 1 1 2], does it mean that the 5 users in sequence connect with 0, 2,1,1, 2 cells correspondingly? And when we use the step function for one time, does it mean that each user can choose to connect or disconnect with 3 cells at one step (which corresponds to the agent's action)? So the action of the agents will only change the connection of the users and cells and will not change the position of the users, is my understanding correct? BR

stefanbschneider commented 12 months ago

Almost, but not exactly right. The actions refer to cell IDs. ID 0 is a no-op, where nothing happens and the corresponding user does not connect to or disconnect from any cell. If an ID >0 is selected, the user connects/disconnects to/from the cell with the corresponding ID. This means, each user can connect to or disconnect from at most one cell per time step. All other connections stay intact as long as the user does not move too far away.

In the example, [0 2 1 1 2] means that the first user with action 0 keeps all connections as they are, the second user with action 2 connects to/disconnects from cell with ID 2, the third user connects to/disconnects from cell ID 1, ...

In each time step, the users move around and can connect/disconnect to/from at most one cell each.

fyouly commented 12 months ago

Almost, but not exactly right. The actions refer to cell IDs. ID 0 is a no-op, where nothing happens and the corresponding user does not connect to or disconnect from any cell. If an ID >0 is selected, the user connects/disconnects to/from the cell with the corresponding ID. This means, each user can connect to or disconnect from at most one cell per time step. All other connections stay intact as long as the user does not move too far away.

In the example, [0 2 1 1 2] means that the first user with action 0 keeps all connections as they are, the second user with action 2 connects to/disconnects from cell with ID 2, the third user connects to/disconnects from cell ID 1, ...

In each time step, the users move around and can connect/disconnect to/from at most one cell each.

Hi @stefanbschneider , Great, thanks very much and that solves my confusions with the experiments.

stefanbschneider commented 12 months ago

Happy to help :) Let me know if you have more questions; I'm closing this issue for now.

If you use mobile-env in your work, I'd be happy if you'd reference the paper and repo. The bibtex entry for the paper is in the Readme:

@inproceedings{schneider2022mobileenv,
  author = {Schneider, Stefan and Werner, Stefan and Khalili, Ramin and Hecker, Artur and Karl, Holger},
  title = {mobile-env: An Open Platform for Reinforcement Learning in Wireless Mobile Networks},
  booktitle={Network Operations and Management Symposium (NOMS)},
  year = {2022},
  publisher = {IEEE/IFIP},
}
fyouly commented 7 months ago

Hi, I want to ask some follow-up questions about the action and state definition in the environment: 1) Is the 'obs' in the environment setting the one we always use as 'state' in the reinforcement learning problem? 2) If I want to use this mobile-env with my own dqn, which is a centralized control for several users, a shared replay buffer, should I use "mobile-small-central-v0" or "mobile-small-ma-v0"? 3) Is the number of actions= number of users+1? Thank you. BR

stefanbschneider commented 7 months ago

Hi @fyouly ,

  1. Yes, "obs" or "observations" correspond to the typical "state" in reinforcement learning. I call them "observations" not "state" here to clarify that we do not know the entire "state" but only have partial observations of the state. This has some theoretical implications (eg, convergence guarantees), but practically you can use the "obs" similarly as you would use the "state".
  2. If you have a centralized DQN agent that observes and controls all users simultaneously, you should use the -central- environments. These use the central handler for centralized RL. Depending on the interface of your DQN, you might need to slightly adjust the handler to format observations and actions correctly.
  3. No, if I'm not mistaken, the number of actions equals the number of users. Each user takes one action, where the action is a discrete number: 0 means no-op (= do nothing) and any number i in 1 to N means connect to or disconnect from cell i.

I hope I could answer your questions. If you use mobile-env in your research, it would be great if you could cite it. I am also happy to link your project/paper from the mobile-env repo if you want to share it.