knowledgedefinednetworking / DRL-GNN

BSD 3-Clause "New" or "Revised" License
191 stars 35 forks source link

In your paper, there is no explanation of how you train 'm' and 'u' function, could you please explain a bit? #2

Closed Matten95 closed 3 years ago

paulalmasan commented 3 years ago

The Figure 4 from the paper tries to ilustrate how the message and updated(a RNN) NNs are trained. In the experience replay buffer you store samples of (s, a, r, s'). Then, what DQN does is to compute the q-value of Q(s,a) and uses the equation (3) from the paper to compute a target value. Finally, the DQN computes an error between these two values and it uses backpropagation all the way until the link features to updated we weight's of the NNs.