The Figure 4 from the paper tries to ilustrate how the message and updated(a RNN) NNs are trained. In the experience replay buffer you store samples of (s, a, r, s'). Then, what DQN does is to compute the q-value of Q(s,a) and uses the equation (3) from the paper to compute a target value. Finally, the DQN computes an error between these two values and it uses backpropagation all the way until the link features to updated we weight's of the NNs.
The Figure 4 from the paper tries to ilustrate how the message and updated(a RNN) NNs are trained. In the experience replay buffer you store samples of (s, a, r, s'). Then, what DQN does is to compute the q-value of Q(s,a) and uses the equation (3) from the paper to compute a target value. Finally, the DQN computes an error between these two values and it uses backpropagation all the way until the link features to updated we weight's of the NNs.