Some doubts and questions

fvarno commented 3 years ago

I understand that for some reasons you might not have been able to release your complete code but I would highly appreciate if you could help me answering some questions about your implementation.

The validation set on server, how much data it has and is it taken from original training (before partitioning) or test set?
Do you train your DQN network with one optimization step after each communication round (after pushing the latest experience into replay memory) or multiple steps? Do you wait for the memory to collect some experience or train DQN even with 1 entry? What is the DQN training batch size?
what is the optimization algorithm and learning rate used to train the DQN network?
What is the frequency of updating the target network (from the learning DQN)?
do you use learning rate decay as in FedAvg? Does it match their numbers?
Do you use a discounting factor for reward (\gamma in your paper)?

Thank you in advance!

tian1327 commented 1 year ago

I understand that for some reasons you might not have been able to release your complete code but I would highly appreciate if you could help me answering some questions about your implementation.

The validation set on server, how much data it has and is it taken from original training (before partitioning) or test set?

Do you train your DQN network with one optimization step after each communication round (after pushing the latest experience into replay memory) or multiple steps? Do you wait for the memory to collect some experience or train DQN even with 1 entry? What is the DQN training batch size?

what is the optimization algorithm and learning rate used to train the DQN network?

What is the frequency of updating the target network (from the learning DQN)?

do you use learning rate decay as in FedAvg? Does it match their numbers?

Do you use a discounting factor for reward (\gamma in your paper)?

Thank you in advance!

Hi, did you get any answers?

fvarno commented 1 year ago

I understand that for some reasons you might not have been able to release your complete code but I would highly appreciate if you could help me answering some questions about your implementation.

The validation set on server, how much data it has and is it taken from original training (before partitioning) or test set?

Do you train your DQN network with one optimization step after each communication round (after pushing the latest experience into replay memory) or multiple steps? Do you wait for the memory to collect some experience or train DQN even with 1 entry? What is the DQN training batch size?

what is the optimization algorithm and learning rate used to train the DQN network?

What is the frequency of updating the target network (from the learning DQN)?

do you use learning rate decay as in FedAvg? Does it match their numbers?

Do you use a discounting factor for reward (\gamma in your paper)?

Thank you in advance!

Hi, did you get any answers?

I haven't, unfortunately.

firewood1996 commented 1 year ago

I understand that for some reasons you might not have been able to release your complete code but I would highly appreciate if you could help me answering some questions about your implementation.

The validation set on server, how much data it has and is it taken from original training (before partitioning) or test set?

Do you train your DQN network with one optimization step after each communication round (after pushing the latest experience into replay memory) or multiple steps? Do you wait for the memory to collect some experience or train DQN even with 1 entry? What is the DQN training batch size?

what is the optimization algorithm and learning rate used to train the DQN network?

What is the frequency of updating the target network (from the learning DQN)?

do you use learning rate decay as in FedAvg? Does it match their numbers?

Do you use a discounting factor for reward (\gamma in your paper)?

Thank you in advance!

Hi, did you get any answers?

I haven't, unfortunately.

hi, basing on the understanding for this paper, I have reproduced the whole code of this paper. Although i still have some little bug about it, the final result can be got. I will release the whole code in the future. After that, we can discuss about it

fvarno commented 1 year ago

I understand that for some reasons you might not have been able to release your complete code but I would highly appreciate if you could help me answering some questions about your implementation.

The validation set on server, how much data it has and is it taken from original training (before partitioning) or test set?

Do you train your DQN network with one optimization step after each communication round (after pushing the latest experience into replay memory) or multiple steps? Do you wait for the memory to collect some experience or train DQN even with 1 entry? What is the DQN training batch size?

what is the optimization algorithm and learning rate used to train the DQN network?

What is the frequency of updating the target network (from the learning DQN)?

do you use learning rate decay as in FedAvg? Does it match their numbers?

Do you use a discounting factor for reward (\gamma in your paper)?

Thank you in advance!

Hi, did you get any answers?

I haven't, unfortunately.

hi, basing on the understanding for this paper, I have reproduced the whole code of this paper. Although i still have some little bug about it, the final result can be got. I will release the whole code in the future. After that, we can discuss about it

Cool! If you've ever decided to implement it using FedSim I can help setting up.

firewood1996 commented 1 year ago

I understand that for some reasons you might not have been able to release your complete code but I would highly appreciate if you could help me answering some questions about your implementation.

The validation set on server, how much data it has and is it taken from original training (before partitioning) or test set?

Do you train your DQN network with one optimization step after each communication round (after pushing the latest experience into replay memory) or multiple steps? Do you wait for the memory to collect some experience or train DQN even with 1 entry? What is the DQN training batch size?

what is the optimization algorithm and learning rate used to train the DQN network?

What is the frequency of updating the target network (from the learning DQN)?

do you use learning rate decay as in FedAvg? Does it match their numbers?

Do you use a discounting factor for reward (\gamma in your paper)?

Thank you in advance!

Hi, did you get any answers?

I haven't, unfortunately.

hi, basing on the understanding for this paper, I have reproduced the whole code of this paper. Although i still have some little bug about it, the final result can be got. I will release the whole code in the future. After that, we can discuss about it

Cool! If you've ever decided to implement it using FedSim I can help setting up.

Thx！I have developed the code based on Open-Ai's gym and this repository. I will try my best to achieve that! :)

tian1327 commented 1 year ago

firewood1996

Hi, thanks for letting us know! Would you mind sharing your implementation with OpenAI gym so far so that maybe I can help debug it?

firewood1996 commented 1 year ago

firewood1996

Hi, thanks for letting us know! Would you mind sharing your implementation with OpenAI gym so far so that maybe I can help debug it?

thx! but there is just a few problems, i think i can handle it :), i wiil try my best!

tian1327 commented 1 year ago

firewood1996

Hi, thanks for letting us know! Would you mind sharing your implementation with OpenAI gym so far so that maybe I can help debug it?

thx! but there is just a few problems, i think i can handle it :), i wiil try my best!

Cool!

firewood1996

Hi, thanks for letting us know! Would you mind sharing your implementation with OpenAI gym so far so that maybe I can help debug it?

thx! but there is just a few problems, i think i can handle it :), i wiil try my best!

Cool! Looking forward to seeing your code!

tian1327 commented 1 year ago

firewood1996

Hi there, I am wondering in your implementation, during the training of DDQN, did you choose only 1 client in each communication round? If that's the case, then only 1 device would report its local weights to the server and thus there would be no FedAvg in the server. I have implemented this scheme, but the results were horrible. I doubt the accuracy would improve using this scheme because for every round, the selected 1 device cannot benefit from observing the weights update from other devices. Would you mind sharing your experiences? Thank you!

firewood1996 commented 1 year ago

According to the paper, they sorted the Q value of the total 100 clients, and then selected 10 clients which were with bigger Q value than others. While training the Q network, they just use the biggest Q value to train. I have tried many ways, but the performance still does not achieve the goal which was mentioned in the paper. Actually, I strongly suspect that DDQN can actually work.

Using DDQN means that it costs much of the time to train. If you just have only one GPU, you must train the model over more than two days. Moreover, if you use the PCA, it will cost more time.
The federated learning model is able to converge with or without the DQN. Therefore, the DQN model training early rewards and late rewards are unlikely to produce very large jumps. The environment in which traditional reinforcement learning operates, such as Gym, does not converge at all in the early training period, which is different from that of federated learning.

I have given up on continuing to optimize using reinforcement learning, as the many methods tried do not achieve the results mentioned in the paper. And knowing that now the authors still haven't open sourced it shows that there are many problems.

I am very sorry that I really do not have the means to achieve. If you have new ideas, you can discuss with me.

firewood1996

Hi there, I am wondering in your implementation, during the training of DDQN, did you choose only 1 client in each communication round? If that's the case, then only 1 device would report its local weights to the server and thus there would be no FedAvg in the server. I have implemented this scheme, but the results were horrible. I doubt the accuracy would improve using this scheme because for every round, the selected 1 device cannot benefit from observing the weights update from other devices. Would you mind sharing your experiences? Thank you!

tian1327 commented 1 year ago

@firewood1996 hi, thanks a lot for sharing your experiences! I have implemented the DQN based on flsim, but still cannot reproduce the training performance. Based on my experiments I strongly agree with you that the "FL model will converge with or without DQN". In another word, during the training of DQN, even selecting the same device for every round, the testing accuracy will improve as more communication rounds go. I also believe that the DQN is not suitable for this type of device selection problem because of the strong dependency between actions.

In case you are interested, you can find our implementation, short presentation, slides and report here. https://github.com/tian1327/flsim_dqn

firewood1996 commented 1 year ago

@firewood1996 hi, thanks a lot for sharing your experiences! I have implemented the DQN based on flsim, but still cannot reproduce the training performance. Based on my experiments I strongly agree with you that the "FL model will converge with or without DQN". In another word, during the training of DQN, even selecting the same device for every round, the testing accuracy will improve as more communication rounds go. I also believe that the DQN is not suitable for this type of device selection problem because of the strong dependency between actions.

In case you are interested, you can find our implementation, short presentation, slides and report here. https://github.com/tian1327/flsim_dqn

I have read about your work, it is a very excellent work!!! And I strongly agree with your opinion that DQN is not suit and the reward setting also does not reveal the intrinsic connections between different clients. By the way, would you mind add my QQ (2497978657) to discuss more about FL?

tian1327 commented 1 year ago

@firewood1996 Sure, I don't use QQ. Would you pls send your wechat ID to my email address here skytamu6@gmail.com Thanks

fvarno commented 1 year ago

Dear @tian1327 and @firewood1996, I hope you can find some clear answers to our questions regarding this project very soon 🚀. I'm interested to know about your attempts. In case you'll have some updates I would be thrilled to get to know them.

Best of luck!

tian1327 commented 1 year ago

Dear @tian1327 and @firewood1996, I hope you can find some clear answers to our questions regarding this project very soon 🚀. I'm interested to know about your attempts. In case you'll have some updates I would be thrilled to get to know them.

Best of luck!

Feel free to check our implementation and results (code, report, slide, Youtube) here: https://github.com/tian1327/flsim_dqn

iQua / flsim

Some doubts and questions #7