Closed Alilalily closed 3 years ago
Hi. As you said, the M = 2 because every M episodes is when the algorithm calls the replay() function and performs the training step. We used this to let the replay buffer of 5K samples incorporate some more experiences than if we train after each episode. In our experiments, we observed that it helped in the learning convergence but other setups could work as well as ours. In DQN there is also the trick of copying the weights from the primary network to the target network for better stabilization. We set this hyperparameter to 50 episodes.
Thank you very much for your reply. And there is a question, so not every step here for a training session, but every 2 episodes (using 5 x 32 samples)? In addition, I've read almost all of your team's papers, and using GNN in a computer network is really enlightening, even if it's easy to relate to because they're both graphs. I understand that the inputs and outputs of GNN are graphical representations, whereas in computer networks, the graphical structure is not complex, so the future direction in a computer network seems to be combined with other methods, maybe the well-designed representations of nodes and links in graphs is important? But when I look at the new paper on graph neural networks, this part seems not a hot spot.
Hi, yes but you use all samples from your experience replay buffer.
Copy that, thanks a lot !!!
Hello, I am trying to do the experiment in this paper. When I read the paper, I did not find the value of parameter M. In addition, I found that in Section 5.1, the experimental process is described as "...To stabilize the learning process, we re-train the weights of the GNN model every 2 episodes and every time we use 5 batches (with 32 samples) from the experience buffer.", but in traditional DQN, stability is solved by using TargetNet. What I understand is that in the paper, two GNNs are used as EvaluteNet and TargetNet respectively. IIs there anything wrong with my understanding , because I did not find the details of the TargetNet update in the paper.