Open Suliucuc opened 5 years ago
Did you see the reward improving in validation dataset? td_loss (policy gradient loss) has large variance can be due to many reasons; one important issue is from the stochasticity in the input process (network trace in this case). This paper provides more details about it: https://openreview.net/forum?id=Hyg1G2AqtQ, https://people.csail.mit.edu/hongzi/var-website/index.html
Thanks for reply.This paper will give me good instruction.I have read the question https://github.com/hongzimao/pensieve/issues/11.I think maybe the bad convergence performance is related to the problem that I didn't reduce the entropy weight to a small value after a certain number of iterations but keep it at 5. I will try this.I will also learn this paper.There is another question about the dataset.I want to know what's the trace dataset of the pretrain_model in the code.Can I use the synthetic trace as pretrain dataset?I know the data in dropbox link is a subset,but I find I can't access it.And I also can't download the FCC broadband dataset, Norway HSDPA bandwidth logs,Belgium 4G/LTE bandwidth logs (bonus)、homewifi dataset. Is it inaccessible now?
I think you can still download the data from that dropbox link. Synthetic data can be generated using https://github.com/hongzimao/pensieve/blob/master/sim/synthetic_traces.py
I have run multi_agent.py,I set the iteration number of central_agent to 10^5 and set the learning rate as you proposed.And I use tensorboard to check the curve of td_loss.But the curve couldn't converge,the range is very wide.Could you give me some guidance on convergence issue?I feel so appreciate for your help.