问题 - Githubissues

vincent-NJW commented 5 months ago

 Hello, I am a graduate student in China. My research direction is the application of reinforcement learning in edge computing and federated learning. Recently, I carefully read your article "Exploring Deep Reinforcement Learning Assisted Federated Learning for Online Resource Allocation in Privacy Resilient EdgeIoT" and found that your idea is very excellent and forward thinking. Then, I ran the project code you uploaded to Github, but encountered some problems. After 1000 episodes, I found that the reward for each round is None.May I ask for specific reasons. Thank you, hope reply.

jjzgeeks commented 5 months ago

 Hello, I am a graduate student in China. My research direction is the application of reinforcement learning in edge computing and federated learning. Recently, I carefully read your article "Exploring Deep Reinforcement Learning Assisted Federated Learning for Online Resource Allocation in Privacy Resilient EdgeIoT" and found that your idea is very excellent and forward thinking. Then, I ran the project code you uploaded to Github, but encountered some problems. After 1000 episodes, I found that the reward for each round is None.May I ask for specific reasons. Thank you, hope reply.

Hi, I have never encountered this problem. Considering the problem you are facing, it is recommended to focus on only one episode, make sure you have input data, and your actions fall between lower bound and upper bound.

Good luck, Jingjing

vincent-NJW commented 5 months ago

Thank you very much for your reply. I ran the jingjing_td3_lstm_datasize.py file, and when creating the environment, I passed in the original_data_1000_20_6_mormal. mat file that you submitted. I think it should be an environmental issue. Do you think my above running process is standardized.

jjzgeeks commented 5 months ago

Thank you very much for your reply. I ran the jingjing_td3_lstm_datasize.py file, and when creating the environment, I passed in the original_data_1000_20_6_mormal. mat file that you submitted. I think it should be an environmental issue. Do you think my above running process is standardized.

Hi, I strongly recommend that you run jingjing_td3_lstm_v8.py file first. The jingjing_td3_lstm_datasize.py is not the mian file.

Best, Jingjing

vincent-NJW commented 5 months ago

Thank you very much for your reply. I ran the jingjing_td3_lstm_datasize.py file, and when creating the environment, I passed in the original_data_1000_20_6_mormal. mat file that you submitted. I think it should be an environmental issue. Do you think my above running process is standardized.

Hi, I strongly recommend that you run jingjing_td3_lstm_v8.py file first. The jingjing_td3_lstm_datasize.py is not the mian file.

Best, Jingjing

  Sure, thank you very much for your suggestion. I will continue to delve into the research over the next few days. If I have any questions later, may I consult you? Additionally, can I contact you through the email address provided in your paper?

jjzgeeks commented 5 months ago

Thank you very much for your reply. I ran the jingjing_td3_lstm_datasize.py file, and when creating the environment, I passed in the original_data_1000_20_6_mormal. mat file that you submitted. I think it should be an environmental issue. Do you think my above running process is standardized.

Hi, I strongly recommend that you run jingjing_td3_lstm_v8.py file first. The jingjing_td3_lstm_datasize.py is not the mian file. Best, Jingjing
  Sure, thank you very much for your suggestion. I will continue to delve into the research over the next few days. If I have any questions later, may I consult you? Additionally, can I contact you through the email address provided in your paper?

No problem. Feel free to contact me if you have any questions.

Good luck, Jingjing

vincent-NJW commented 5 months ago

Hi there,

I ran the jingjing_td3_lstm_v8.py file from your repository. After 1000 episodes, I checked the rewards and saw that the reward curve didn't go up; it actually went down a bit. Is this normal? I didn't change any parameters and just used the default settings.Also, can I contact you through your email?

Thanks!

vincent-NJW commented 5 months ago

Oh, sorry, I just checked the simulation part of your paper, and it indeed shows this phenomenon. Your AE gain remains at a relatively stable value. Are you referring to the reward value in the reinforcement learning algorithm when you mention AE gain? This is the first time I've encountered a situation where the reward value doesn't increase significantly

vincent-NJW commented 5 months ago

I modified the generate_data_10.py file to generate an environment data file with 5000 steps for 60 clients. Then, I changed max_episode=5000 in jingjing_td3_lstm_v8.py to see if the reward increases. It is currently running.

jjzgeeks commented 5 months ago

Oh, sorry, I just checked the simulation part of your paper, and it indeed shows this phenomenon. Your AE gain remains at a relatively stable value. Are you referring to the reward value in the reinforcement learning algorithm when you mention AE gain? This is the first time I've encountered a situation where the reward value doesn't increase significantly

Hi, the displayed AE gain is just in each communication round instead of accumulated rewards over the communication rounds.

Best, Jingjing

vincent-NJW commented 5 months ago

36fabec87724057b2c97992d7f6940ba Hello, thank you very much for your response. I saw in the step function that you use the content of this image as the reward. I trained for 5000 steps and found that the reward value remained almost unchanged. Is this the AE gain you mentioned in your paper?

Thanks, Vincent

jjzgeeks commented 5 months ago

Hello, thank you very much for your response. I saw in the step function that you use the content of this image as the reward. I trained for 5000 steps and found that the reward value remained almost unchanged. Is this the AE gain you mentioned in your paper?

Thanks, Vincent

Hi，as I mentioned above, the reward value is the AE gain in each communication round insetad of accumulating rewards over the comunication rounds. Therefore, it is reasonable for AE gain to remain stable. If you want to count the accumulated rewards, that is, the current reward includes all the rewards previously obtained, just add them up. This accumulated reward will increase over the communication rounds.

Best, Jingjing

vincent-NJW commented 4 months ago

Hello, thank you very much for your response. I saw in the step function that you use the content of this image as the reward. I trained for 5000 steps and found that the reward value remained almost unchanged. Is this the AE gain you mentioned in your paper? Thanks, Vincent

Hi，as I mentioned above, the reward value is the AE gain in each communication round insetad of accumulating rewards over the comunication rounds. Therefore, it is reasonable for AE gain to remain stable. If you want to count the accumulated rewards, that is, the current reward includes all the rewards previously obtained, just add them up. This accumulated reward will increase over the communication rounds.

Best, Jingjing

Hi Jingjing, sorry for the late response. I've had some things to deal with lately. In your code, you store the rewards for each step in an empty list called episode_reward, and then at the end of an episode, you average the episode_reward and put it into rewards. So, rewards holds the average reward for each episode.

In reinforcement learning training, isn't the average reward for each episode supposed to increase over time? What you mentioned about cumulative rewards is correct—in theory, the cumulative reward should increase with more communication rounds. However, that doesn't really show that the RL agent is learning or gaining experience. It's more about physical accumulation.

jjzgeeks commented 4 months ago

Hello, thank you very much for your response. I saw in the step function that you use the content of this image as the reward. I trained for 5000 steps and found that the reward value remained almost unchanged. Is this the AE gain you mentioned in your paper? Thanks, Vincent

Hi，as I mentioned above, the reward value is the AE gain in each communication round insetad of accumulating rewards over the comunication rounds. Therefore, it is reasonable for AE gain to remain stable. If you want to count the accumulated rewards, that is, the current reward includes all the rewards previously obtained, just add them up. This accumulated reward will increase over the communication rounds. Best, Jingjing

Hi Jingjing, sorry for the late response. I've had some things to deal with lately. In your code, you store the rewards for each step in an empty list called episode_reward, and then at the end of an episode, you average the episode_reward and put it into rewards. So, rewards holds the average reward for each episode.

In reinforcement learning training, isn't the average reward for each episode supposed to increase over time? What you mentioned about cumulative rewards is correct—in theory, the cumulative reward should increase with more communication rounds. However, that doesn't really show that the RL agent is learning or gaining experience. It's more about physical accumulation.

Hi, the average reward for each episode is the AE gain in each communication round. The AE gain in each communication is independent. The RL agent learns in each communication round. Actually, the average reward in communication round does not necessarily increase over time, as it depends on the environment and client selection.

Best, Jingjing

vincent-NJW commented 4 months ago

Hello, thank you very much for your response. I saw in the step function that you use the content of this image as the reward. I trained for 5000 steps and found that the reward value remained almost unchanged. Is this the AE gain you mentioned in your paper? Thanks, Vincent

Hi，as I mentioned above, the reward value is the AE gain in each communication round insetad of accumulating rewards over the comunication rounds. Therefore, it is reasonable for AE gain to remain stable. If you want to count the accumulated rewards, that is, the current reward includes all the rewards previously obtained, just add them up. This accumulated reward will increase over the communication rounds. Best, Jingjing

Hi Jingjing, sorry for the late response. I've had some things to deal with lately. In your code, you store the rewards for each step in an empty list called episode_reward, and then at the end of an episode, you average the episode_reward and put it into rewards. So, rewards holds the average reward for each episode. In reinforcement learning training, isn't the average reward for each episode supposed to increase over time? What you mentioned about cumulative rewards is correct—in theory, the cumulative reward should increase with more communication rounds. However, that doesn't really show that the RL agent is learning or gaining experience. It's more about physical accumulation.

Hi, the average reward for each episode is the AE gain in each communication round. The AE gain in each communication is independent. The RL agent learns in each communication round. Actually, the average reward in communication round does not necessarily increase over time, as it depends on the environment and client selection.

Best, Jingjing

Hello, Jingjing. According to your statement, how can we prove that an intelligent agent has learned knowledge? In other words, the reward for each episode is similar. How can we ensure that the proposed algorithm converges to an optimal solution. If the reward remains unchanged, how can I determine the number of episodes to train? It means that training one episode and 1000 episodes have the same effect.

jjzgeeks commented 4 months ago

Hello, thank you very much for your response. I saw in the step function that you use the content of this image as the reward. I trained for 5000 steps and found that the reward value remained almost unchanged. Is this the AE gain you mentioned in your paper? Thanks, Vincent

Hi，as I mentioned above, the reward value is the AE gain in each communication round insetad of accumulating rewards over the comunication rounds. Therefore, it is reasonable for AE gain to remain stable. If you want to count the accumulated rewards, that is, the current reward includes all the rewards previously obtained, just add them up. This accumulated reward will increase over the communication rounds. Best, Jingjing

Hi Jingjing, sorry for the late response. I've had some things to deal with lately. In your code, you store the rewards for each step in an empty list called episode_reward, and then at the end of an episode, you average the episode_reward and put it into rewards. So, rewards holds the average reward for each episode. In reinforcement learning training, isn't the average reward for each episode supposed to increase over time? What you mentioned about cumulative rewards is correct—in theory, the cumulative reward should increase with more communication rounds. However, that doesn't really show that the RL agent is learning or gaining experience. It's more about physical accumulation.

Hi, the average reward for each episode is the AE gain in each communication round. The AE gain in each communication is independent. The RL agent learns in each communication round. Actually, the average reward in communication round does not necessarily increase over time, as it depends on the environment and client selection. Best, Jingjing

Hello, Jingjing. According to your statement, how can we prove that an intelligent agent has learned knowledge? In other words, the reward for each episode is similar. How can we ensure that the proposed algorithm converges to an optimal solution. If the reward remains unchanged, how can I determine the number of episodes to train? It means that training one episode and 1000 episodes have the same effect.

Hi, as mentioned in the paper, this optimization problem has been proven to be NP-hard. It is difficult to ensure that we can obtain the optimal solution. We compare the obtained objective function value with the benchmark, FedAECS, that is our previous work, it shows superiority.

Best, Jingjing

jjzgeeks / FL-DLT3

问题 #3