zhouyou-gu / drl-5g-scheduler

Codes for paper "Knowledge-Assisted Deep Reinforcement Learning in 5G Scheduler Design: From Theoretical Framework to Implementation"
GNU Affero General Public License v3.0
29 stars 12 forks source link

#code structure issue #4

Open xiaohui7 opened 6 months ago

xiaohui7 commented 6 months ago

Hello, I'd like to reproduce the simulation results in Part Seven of the paper, but I'm not sure which part of the code corresponds to this part. According to the instructions in readme,may I ask if the offline part of the code only needs to run this line (PYTHONPATH=·/python3·/sim_script_example./ka.py)to see all the results?

zhouyou-gu commented 6 months ago

Hi @xiaohui7 , yes, the offline codes only need to run that command under "controller_src". The results will be recorded in tensorboard. To see the results, open the tensorboard interference as instructed in readme.

xiaohui7 commented 6 months ago

IMG_20240312_173554

After running it, the interface looks like this. I feel it's may not the final result. Is there anything wrong with it and how can I solve it?

xiaohui7 commented 6 months ago

Can I get these results through the offline codes? Screenshot_20240312_174949_com flexcil flexcilnote Screenshot_20240312_174921_com flexcil flexcilnote

zhouyou-gu commented 6 months ago

Hi @xiaohui7 , these pictures in the paper are generated by MATLAB using the data recorded in tensorboard. Here, tensorboard is used as a data storage system with a visualizer, which is very convenient for testing. Please check tensorboard document to see how to take out data saved in tensorboard log file (or you can have a look at TBScalarToCSV class in controller_src/sim_src/tb_logger.py in this project).

xiaohui7 commented 6 months ago

And may I ask which scalar in tensorboard refers to the packet loss probabilities?Does it only record the packet loss probabilities of the K-DDPG algorithm inside?If so, how can I get the data of other methods?

zhouyou-gu commented 6 months ago

Hi @xiaohui7 , the MATLAB codes that plot the results are not shared. You may have a look at MATLAB's document on functions "plot", "bar", "image" for the figures in the above screenshot.

"And may I ask which scalar in tensorboard refers to the packet loss probabilities?" For packet loss probabilities, please have a look at lines 82 - 94 in controller_src/sim_src/sim_helper/csv_to_result.py. It computes 1-(successful packets)/(total packets) as the loss rate.

"Does it only record the packet loss probabilities of the K-DDPG algorithm inside?" Yes, it only runs the complete K-DDPG as proposed in the paper.

"If so, how can I get the data of other methods?" For variants of our methods, you can try different classes in controller_src/sim_src/model/ddpg.py in the script.

For heuristic scheduling function, e.g., rr. edf, you can implement them by using the functions in /controller_src/sim_src/regression/function.py and by inserting them in the SimAgent class "scheduler_function". It will force the agent to use the heuristic functions.

For other compared methods in referenced papers, we do not provide their implementation.

xiaohui7 commented 5 months ago

Thank you very much for your response,it helps me a lot to understand the code.May I ask that if the DDPG class in controller_src/sim_src/model/ddpg.py is the original DDPG method?And what's the difference between the class of MultiheadcriticDDPG and the MultiheadcriticDDPG_NEW_PER?

zhouyou-gu commented 5 months ago

Hi @xiaohui7

DDPG class in that file uses multi-head, which should be exactly the same as MultiheadcriticDDPG (here, I made a new class just for differentiation if I remember this correctly. It has been a long time since these codes have been developed:) ).

You should try SingleHeadCriticDDPGclass for the original DDPG, as the original one has no multihead.

For MultiheadcriticDDPG and MultiheadcriticDDPG_NEW_PER, MultiheadcriticDDPG_NEW_PER includes a _per_w_multiplier function to consider the packet delay in the importance sampling, while MultiheadcriticDDPG (that is same as DDPG class) only uses the critic's approximation errors.

I hope the above could help you understand the classes.

xiaohui7 commented 5 months ago

Thank you a lot,it helps me a lot. And I'd like to ask what are the meanings of the three parameters in_dim,out_dim,d_min_pct in the class of EDFschedulerFunction?

xiaohui7 commented 5 months ago

And even if using the EDFschedulerFunction, I still need to train in a neural network?

xiaohui7 commented 5 months ago

And in the class of EDFschedulerFunction,which part show the idea of dynamic priority of the EDF algorithm?

zhouyou-gu commented 5 months ago

Hi @xiaohui7 , try the below script (put it under the same folder as ka.py). It has been a long time, I am not sure that the script will directly run or not. Anyway, you can find how to configure the simulation with only scheduling functions (no NN) as below.

what are the meanings of the three parameters in_dim,out_dim,d_min_pct in the class of EDFschedulerFunction? in_dim: the state size (channel and HoL, i.e., 2* the number of users), out_dim: the action size (binary action {+1,-1} per user, i.e., the number of users), d_min_pct: min_delay/max_delay (see our paper for the definition).

And even if using the EDFschedulerFunction, I still need to train in a neural network? No, u dnot need to do it. It only runs the EDF function (see below script), where no NN is needed.

And in the class of EDFschedulerFunction,which part show the idea of dynamic priority of the EDF algorithm? line 64-76, it selects those users with the highest HoL until there are not enough RBs.

#
# Created on 13/feb/20.
# Author: Zhouyou Gu <guzhouyou@gmail.com>.
#
import os

from sim_src.config_helper.ddpg_config import *
from sim_src.config_helper.env_config import *
from sim_src.controller import PySimController
from sim_src.model.model import MultiHeadCriticDDPG
from sim_src.regression.function import EDFSchedulerFuncion
from sim_src.replay_memory.per_proportional import PER_PROPORTIONAL_REPLAY_MEMORY_CONFIG, PERProportional
from sim_src.replay_memory.replay_memory import SimReplayMemory, ReplayMemory
from sim_src.sim_env.sim_agent import SimAgent
from sim_src.sim_env.sim_env import SimEnvTxBinary, SimEnvTxBinary_RewardShaping
from sim_src.sim_env.action_noise import *

N_UE = 5
TOTAL_N_RB = 50
TRAINING_EPISODE = 200
EVALUATION_EPISODE = 100

env_c = env_config_helper()
env_c.N_UE = N_UE
env_c.TOTAL_N_RB = TOTAL_N_RB
env_c.N_EPISODE = TRAINING_EPISODE

drl_c = ddpg_config_helper(env_c.N_UE, env_c.N_STEP * env_c.N_EPISODE)

env_c.reload_config()
drl_c.reload_config()
###########################################################################
#eval
###########################################################################

import os
log_path = os.path.abspath(os.path.dirname(os.path.realpath(__file__)))
folder_name = "eval-edf-" + str(env_c.TOTAL_N_RB)
experiment_name = "ka-new-per"
GLOBAL_LOGGER.set_log_path(log_path,folder_name,experiment_name)
scalar_list = []
scalar = 'TX_DELAY_'
scalar_list.append(scalar)

scalar = 'N_RLCTX_'
scalar_list.append(scalar)

scalar = 'N_DISCARD_'
scalar_list.append(scalar)

scalar = 'RLC_REWARD_'
scalar_list.append(scalar)

scalar = 'N_CH_TX_OK_'
scalar_list.append(scalar)

scalar = 'UE_REWARD_'
scalar_list.append(scalar)
GLOBAL_LOGGER.get_tb_logger().set_scalar_filter(scalar_list)

D_MIN_to_D_MAX_pct = float(env_c.D_MIN)/float(env_c.D_MAX)
sf = EDFSchedulerFuncion(env_c.N_UE*drl_c.N_UE_INPUT,env_c.N_UE,D_MIN_to_D_MAX_pct)

agent = SimAgent(0, env_c.agent_config, ReplayMemory(), scheduler_function=sf)
agent.action_noise = None

env = SimEnvTxBinary(0, env_c.sim_env_config, agent)

env.start()

env.join()

GLOBAL_LOGGER.close_logger()
xiaohui7 commented 5 months ago

I'm deeply grateful for your answer. Is it normal for code to run much longer using only scheduling functions than using neural networks?And why does this phenomenon occur?

xiaohui7 commented 5 months ago

I'm deeply grateful for your answer. Is it normal for code to run much longer using only scheduling functions than using neural networks?And why does this phenomenon occur?

Oh,it should be the problem of my computer. I tried again later, it was normal and the running time was shorter than NN.

xiaohui7 commented 5 months ago

I'd like to ask that how is the packet loss rate of kddpg measured under different RB numbers? Considering that the kddpg algorithm does not converge at the beginning and has a very high packet loss rate, should we remove this part of the data when comparing the packet loss rate with other algorithms (EDF,etc.) ?

xiaohui7 commented 5 months ago

Another issue is that if the training time of the KA algorithm is too long, it is easy to overfit and increase the packet loss rate. So, how can I reasonably select data to measure the packet loss rate of the KA algorithm? Unfortunately, the average packet loss rate of all users obtained under the KA algorithm reached 0.2, which is much higher than the 0.0015 of EDF and others.

xiaohui7 commented 5 months ago

And what's the meaning of self.C_step in the ddpg_config_helper? What's the difference between it and self.N_step?

xiaohui7 commented 5 months ago

And in which function was EVALUATION_EPISODE used? Screenshot_20240411_111229_com flexcil flexcilnote Is this 2000 episodes in the paper EVALUATION_EPISODE not TRAINING_EPISODE?

zhouyou-gu commented 5 months ago

Hi @xiaohui7 .

For over fitting issue, yes, it will overfit after a long-training time. you should select the NN trained in the mid of the training as the final NN. Please note that we cannot ensure when "a good NN" is trained due to the randomness of the training algorithm. Note that the ddpg class should automatically save the actor and critic for every few steps (check the codes for details). To select a well-trained NN, you can have a look at the transmission time in the tensorboard, and my experience (if i remember it correctly) is that when all the packets are transmitted within the delay bounds then they are good NNs.

For reliability evaluation, the loss rate should not be evaluated over the training results. This is because it contains those "bad" results when NNs are not well-trained. The NN should be saved (as mentioned above) when it is well-trained, and it should loaded separately for evaluation. You can call ddpg.start_eval() to stop all the training updates (double check on this to see whether the NNs are updated or not with ddpg.start_eval(), I cannot be sure about it as it has been a long time.) and you only need to load the actor for evaluation. Please have a look at the attached script for more details. Note that you need to change the path to load since the path shown below is what I saved on my PC.

For packet loss comparison with edf, the high reliability of NN is due to the importance-sampling (IS), as discussed in the paper. A good approach to obtain a high-reliabile actor is to load the NNs that have been trained well. Then, further fine-tune it using the algorithm (with IS) with a lower learning rate and disables the exploration etc, e.g., drl_c.actor_lr = 1e-4 drl_c.critic_lr = 1e-4 drl_c.tau = 1e-4, drl_c.BATCH_SIZE = 100, rm.beta = 1, agent.action_noise = None. (place these settings are the corresponding locations in the training script with slow learning rates)

For those unknow parameters, you can search for the codes to see whether it is used or not. Some parameters (or some mechanisms) are not used because they are only implemented for testing purposes (but I did not delete them).

For EVALUATION_EPISODE, yes, it means evaluation without NN weight updates (in the bar figure of the paper). See the above comments for more details.

#
# Created on 13/feb/20.
# Author: Zhouyou Gu <guzhouyou@gmail.com>.
#
import os

from sim_src.config_helper.ddpg_config import *
from sim_src.config_helper.env_config import *
from sim_src.controller import PySimController
from sim_src.model.model import MultiHeadCriticDDPG
from sim_src.replay_memory.per_proportional import PER_PROPORTIONAL_REPLAY_MEMORY_CONFIG, PERProportional
from sim_src.replay_memory.replay_memory import SimReplayMemory, ReplayMemory
from sim_src.sim_env.sim_agent import SimAgent
from sim_src.sim_env.sim_env import SimEnvTxBinary, SimEnvTxBinary_RewardShaping
from sim_src.sim_env.action_noise import *

nn_load_path = '/home/deep/ddrl/controller_src/sim_script_jsac/train_eval/train/ka-new-per-2020-June-27-21-36-52'

N_UE = 5
TOTAL_N_RB = 50
TRAINING_EPISODE = 200
EVALUATION_EPISODE = 100

env_c = env_config_helper()
env_c.N_UE = N_UE
env_c.TOTAL_N_RB = TOTAL_N_RB
env_c.N_EPISODE = TRAINING_EPISODE

###########################################################################
#eval
###########################################################################

env_c = env_config_helper()
env_c.N_UE = N_UE
env_c.TOTAL_N_RB = TOTAL_N_RB
env_c.N_EPISODE = EVALUATION_EPISODE

drl_c = ddpg_config_helper(env_c.N_UE, env_c.N_STEP * env_c.N_EPISODE)

drl_c.actor_lr = 0.
drl_c.critic_lr = 0.
# drl_c.actor_load_path = os.path.join(nn_load_path,'actor_target_65000.pt')
drl_c.actor_load_path = '/home/deep/ddrl/controller_src/sim_script_jsac/train_eval/train/ka-new-per-2020-June-28-00-53-05/actor_target_65000.pt'

env_c.reload_config()
drl_c.reload_config()

assert env_c.N_UE == 5
assert isinstance(drl_c.actor_config.af_config[-1], nn.modules.Tanh)

import os
log_path = os.path.abspath(os.path.dirname(os.path.realpath(__file__)))
folder_name = "eval-new-per-" + str(env_c.TOTAL_N_RB)
experiment_name = "ka-new-per"
GLOBAL_LOGGER.set_log_path(log_path,folder_name,experiment_name)
scalar_list = []
scalar = 'TX_DELAY_'
scalar_list.append(scalar)

scalar = 'N_RLCTX_'
scalar_list.append(scalar)

scalar = 'N_DISCARD_'
scalar_list.append(scalar)

scalar = 'RLC_REWARD_'
scalar_list.append(scalar)

scalar = 'N_CH_TX_OK_'
scalar_list.append(scalar)

scalar = 'UE_REWARD_'
scalar_list.append(scalar)
GLOBAL_LOGGER.get_tb_logger().set_scalar_filter(scalar_list)
agent = SimAgent(0, env_c.agent_config, ReplayMemory())
agent.action_noise = None

env = SimEnvTxBinary(0, env_c.sim_env_config, agent)

ddpg = MultiHeadCriticDDPG(0, drl_c.ddpg_config)
ddpg.start_eval()

controller = PySimController(0, drl_c.controller_config, agent, ddpg, ReplayMemory())

env.start()
env.join()
GLOBAL_LOGGER.close_logger()
xiaohui7 commented 4 months ago

I have another question now, How to judge the convergence of the algorithm?Are there some related metrics for that?

xiaohui7 commented 3 months ago

image The actor and critic loss functions I obtained from training are shown in this graph. Why are their values negative? Does this mean the algorithm hasn't converged? Why is this happening? Additionally, there is a significant difference between the Wall Time in TensorBoard and the actual running time. Can I directly use Wall Time for comparative analysis?

xiaohui7 commented 3 months ago

For the channel model section, the paper mentions that the ratio of the average power of the line-of-sight path to the average power of the non-line-of-sight path is set to 0.6. Which parameter in this code corresponds to that? Should it be (self.scale/self.shape)^2 = 0.6?

import random from collections import namedtuple

from scipy.stats import rice

from sim_src.sim_env import EnvObject, StatusObject from sim_src.sim_env.math_models import from sim_src.tb_logger import GLOBAL_LOGGER from sim_src.util import

CHANNEL_CONFIG = namedtuple("CHANNEL_CONFIG", ['max_dis', 'step_dis', 'move_p', 'tx_power', 'noise_power', 'T_f', 'rb_bw', 'total_n_rb']) CHANNEL_STATE = namedtuple("CHANNEL_STATE", ['snr_db'])

class Channel(EnvObject): def init(self, id, config): self.id = id self.config = config self.dis = 0 self.init_distance()

    self.scale = 0.559
    #self.shape = 0.612 / self.scale
    self.shape = 0.433
    self.small_scale_gain = rice.rvs(self.shape, scale=self.scale)

def get_state(self):
    return CHANNEL_STATE(self.get_snr_db())

def step(self, action):
    pass

def change_position(self):
    if p_true(self.config.move_p):
        if p_true(0.5):
            self.increase_distance()
        else:
            self.decrease_distance()
    # small scale channel gain. 20% to be changed
    if p_true(0.2):
        self.small_scale_gain = rice.rvs(self.shape, scale=self.scale)

def get_snr_db(self) -> float:
    # large scale channel gain
    snr = distance_to_snr(self.dis, self.config.tx_power, self.config.noise_power)

    snr += dec_to_db(self.small_scale_gain)

    if snr > 20.:
        return 20.
    # TODO: handling out-of-range
    # elif snr < -5.:
    #   return -5.
    else:
        return snr

# return distance_to_snr(self.dis,self.config.tx_power,self.config.noise_power)

def init_distance(self):
    initial_steps = random.randint(0, int(self.config.max_dis / self.config.step_dis))
    for x in range(initial_steps):
        self.increase_distance()

def increase_distance(self):
    if self.dis + self.config.step_dis <= self.config.max_dis:
        self.dis += self.config.step_dis

def decrease_distance(self):
    if self.dis - self.config.step_dis >= 0:
        self.dis -= self.config.step_dis

CHANNEL_UNKNOWN_ERROR_ACTION = namedtuple("CHANNEL_UNKNOWN_ERROR_ACTION", ['n_rb', 'n_byte'])

class ChannelUnknownErr(Channel): ''' A channel object Action: the number of RB, the number of bytes Reward: 1 - tx error rate '''

def __init__(self, id, config):
    super(ChannelUnknownErr, self).__init__(id, config)
    StatusObject.__init__(self)

@counted
def step(self, action):
    err = 0.
    if action.n_rb > 0:
        err = tx_error_rate_for_n_bytes(action.n_byte, action.n_rb, db_to_dec(self.get_snr_db()), self.config.T_f,
                                        self.config.rb_bw)

        if action.n_rb >= self.config.total_n_rb and err < 1e-5:
            err = 1e-5
        if err < 1e-5:
            ret = 5.
        else:
            ret = - math.log10(err)
    else:
        ret = 0.

    n_successful_tx = 1
    if p_true(err):
        n_successful_tx = 0

    GLOBAL_LOGGER.get_tb_logger().add_scalar('NRB_' + str(self.id), action.n_rb, self.n_step)
    GLOBAL_LOGGER.get_tb_logger().add_scalar('SNR_' + str(self.id), self.get_snr_db(), self.n_step)
    GLOBAL_LOGGER.get_tb_logger().add_scalar('E_' + str(self.id), err, self.n_step)
    GLOBAL_LOGGER.get_tb_logger().add_scalar('DIS_' + str(self.id), self.dis, self.n_step)
    GLOBAL_LOGGER.get_tb_logger().add_scalar('CH_REWARD_' + str(self.id), ret, self.n_step)
    GLOBAL_LOGGER.get_tb_logger().add_scalar('N_CH_TX_OK_' + str(self.id), n_successful_tx, self.n_step)
    self.change_position()

    return ret

class ChannelUnknownErrBinaryReward(Channel): ''' reward is a binary indicator to show whether the packet is transmitted or not '''

def __init__(self, id, config):
    super(ChannelUnknownErrBinaryReward, self).__init__(id, config)
    StatusObject.__init__(self)

@counted
def step(self, action):
    err = 0.
    if action.n_rb > 0:
        err = tx_error_rate_for_n_bytes(action.n_byte, action.n_rb, db_to_dec(self.get_snr_db()), self.config.T_f,
                                        self.config.rb_bw)

        if action.n_rb >= self.config.total_n_rb and err < 1e-5:
            err = 1e-5
        if err < 1e-5:
            ret = 5.
        else:
            ret = - math.log10(err)
    else:
        ret = 0.

    n_successful_tx = 1
    if p_true(err):
        n_successful_tx = 0

    GLOBAL_LOGGER.get_tb_logger().add_scalar('NRB_' + str(self.id), action.n_rb, self.n_step)
    GLOBAL_LOGGER.get_tb_logger().add_scalar('SNR_' + str(self.id), self.get_snr_db(), self.n_step)
    GLOBAL_LOGGER.get_tb_logger().add_scalar('E_' + str(self.id), err, self.n_step)
    GLOBAL_LOGGER.get_tb_logger().add_scalar('DIS_' + str(self.id), self.dis, self.n_step)
    GLOBAL_LOGGER.get_tb_logger().add_scalar('CH_REWARD_' + str(self.id), ret, self.n_step)
    GLOBAL_LOGGER.get_tb_logger().add_scalar('N_CH_TX_OK_' + str(self.id), n_successful_tx, self.n_step)
    self.change_position()

    return float(n_successful_tx)

if name == 'main':

err = tx_error_rate_for_n_bytes(32, 6, 1.0221785259170593, 0.000125, 180000.0)

# print(err)

for x in range(50):
    err = tx_error_rate_for_n_bytes(50., x + 1, db_to_dec(0), 1e-4, 180e3)
    print(err, x)

from scipy.stats import expon
import matplotlib.pyplot as plt

scale = 0.559
shape = 0.612 / scale
print(rice.rvs(shape, scale=scale))
fig, ax = plt.subplots(1, 1)
x = np.linspace(rice.ppf(0.0001, shape, scale=scale), rice.ppf(0.9999, shape, scale=scale), 10000)
ax.plot(x, rice.pdf(x, shape, scale=scale), 'r-', label='rice pdf')
x = np.linspace(expon.ppf(0.01),
                expon.ppf(0.99), 100)
ax.plot(x, expon.pdf(x),
        'r-', lw=5, alpha=0.6, label='expon pdf')
print(rice.rvs(shape, scale=scale))
# 添加图例
ax.legend()

# 添加标题和坐标轴标签
ax.set_title('Probability Density Functions')
ax.set_xlabel('Value')
ax.set_ylabel('Probability Density')

# 显示图形
plt.show()
zhouyou-gu commented 3 months ago

Hi @xiaohui7 , for the convergence, this work does not study how to determine whether the NNs converge to good weights. I determine the NNs' convergence based on the TX timing, i.e., if the users are transmitting in the delay bound, NNs are assumed to be converged, as mentioned in the previous comments in this thread.