tztsai / Energy-Efficient-5G-RL

This repository presents a multi-agent reinforcement learning approach for energy-efficient collaborative control of base stations in 5G networks.
MIT License
27 stars 3 forks source link
5g-networks cellular-networks multi-agent-systems multiagent-reinforcement-learning network-simulation reinforcement-learning

Energy-Efficient Collaborative Base Station Control in Massive MIMO Cellular Networks

This repository is associated with the publication "Multi-agent Reinforcement Learning for Energy Saving in Multi-Cell Massive MIMO Systems". This work provides a Multi-Agent Reinforcement Learning (MARL) approach to minimize the total energy consumption of multiple massive MIMO base stations (BSs) in a multi-cell network, while maintaining overall quality-of-service.

The strategy involves making decisions on multi-level advanced sleep modes, antenna switching, and user association for the base stations. By modelling the problem as a decentralized partially observable Markov decision process (DEC-POMDP), we propose a multi-agent proximal policy optimization (MAPPO) algorithm to obtain a collaborative BS control policy.

Architecture of MAPPO:

image

Overview

This solution has been shown to significantly improve network energy efficiency, adaptively switch the BSs into different depths of sleep, reduce inter-cell interference, and maintain a high quality-of-service (QoS).

Explanation of the Project Structure

train.sh is the bash script for training the MAPPO agent. The hyparameters of the agent can be set in this script. Set the values of wandb_user and wandb_api_key if you want to log your training using wandb. In addition, the config of the cellular network environment can be found at network/config.py.

After training, the model will be stored in the results folder. Then the simulate.sh script can be used to run simulation of the agent interacting with the cellular network. The results of the simulation will be stored in the analysis/sim_stats folder. You can also benchmark the performance of several trained models using analysis/benchmark.py. It will produce plots in several *_plots/ folders, similar to those in the published paper.

Features

The key features of the project include:

Environment Configuration

The configuration of the simulation environment is as follows:

Traffic Model

The traffic generator uses real-world data and contains arrival rates for each time slot (of 20 mins) and each application category.

Action Space

State Space

The state of the environment is defined by:

Reward

The reward function is a combination of

Notes

Contributions and Feedback

Feel free to provide feedback or contribute to this project. You can either fork this repository or open a new issue if you find a bug or have a feature request.