knowledgedefinednetworking / DRL-GNN

BSD 3-Clause "New" or "Revised" License
191 stars 35 forks source link

Some details of chapter 5 experiment #5

Closed Alilalily closed 2 years ago

Alilalily commented 2 years ago

Hello, I am trying to do the experiment in this paper. When I read the paper, I did not find the value of parameter M. In addition, I found that in Section 5.1, the experimental process is described as "...To stabilize the learning process, we re-train the weights of the GNN model every 2 episodes and every time we use 5 batches (with 32 samples) from the experience buffer.", but in traditional DQN, stability is solved by using TargetNet. What I understand is that in the paper, two GNNs are used as EvaluteNet and TargetNet respectively. IIs there anything wrong with my understanding , because I did not find the details of the TargetNet update in the paper.

paulalmasan commented 2 years ago

Hi. As you said, the M = 2 because every M episodes is when the algorithm calls the replay() function and performs the training step. We used this to let the replay buffer of 5K samples incorporate some more experiences than if we train after each episode. In our experiments, we observed that it helped in the learning convergence but other setups could work as well as ours. In DQN there is also the trick of copying the weights from the primary network to the target network for better stabilization. We set this hyperparameter to 50 episodes.

Alilalily commented 2 years ago

Thank you very much for your reply. And there is a question, so not every step here for a training session, but every 2 episodes (using 5 x 32 samples)? In addition, I've read almost all of your team's papers, and using GNN in a computer network is really enlightening, even if it's easy to relate to because they're both graphs. I understand that the inputs and outputs of GNN are graphical representations, whereas in computer networks, the graphical structure is not complex, so the future direction in a computer network seems to be combined with other methods, maybe the well-designed representations of nodes and links in graphs is important? But when I look at the new paper on graph neural networks, this part seems not a hot spot.

paulalmasan commented 2 years ago

Hi, yes but you use all samples from your experience replay buffer.

Alilalily commented 2 years ago

Copy that, thanks a lot !!!