I notice there is a function _update_normal() in gasil.py line 522, which uses the normal replay buffer to do q_train() and p_train(). However, this step seems not appear in the Algorithm 1 in your AAMAS2019 paper. It seems Algorithm 1 only updates the imitation part.
Am I right or I missing something? Could you explain why do _update_normal() before imitation? Thank you very much.
Dear author, thanks for open source your code.
I notice there is a function
_update_normal()
in gasil.py line 522, which uses the normal replay buffer to doq_train()
andp_train()
. However, this step seems not appear in the Algorithm 1 in your AAMAS2019 paper. It seems Algorithm 1 only updates the imitation part.Am I right or I missing something? Could you explain why do _update_normal() before imitation? Thank you very much.