In your paper, there is no explanation of how you train 'm' and 'u' function, could you please explain a bit?

The Figure 4 from the paper tries to ilustrate how the message and updated(a RNN) NNs are trained. In the experience replay buffer you store samples of (s, a, r, s'). Then, what DQN does is to compute the q-value of Q(s,a) and uses the equation (3) from the paper to compute a target value. Finally, the DQN computes an error between these two values and it uses backpropagation all the way until the link features to updated we weight's of the NNs.

knowledgedefinednetworking / DRL-GNN

In your paper, there is no explanation of how you train 'm' and 'u' function, could you please explain a bit? #2