I implemented your DuelDDQN architecture for myself, and was curious as to the following snippet of the learning function, as my question wasn't detailed in the course.
Why is it that only Q_pred is muliplied by the action maatrix, is it because it represents the actions we have just taken in the current state? Are all of these q_value matrices of the same dimensions?
Hi Phil,
I implemented your DuelDDQN architecture for myself, and was curious as to the following snippet of the learning function, as my question wasn't detailed in the course.
Why is it that only Q_pred is muliplied by the action maatrix, is it because it represents the actions we have just taken in the current state? Are all of these q_value matrices of the same dimensions?