First of all, thank you so much for making this work open source!
There should be two MLP behind the online network in the paper, and there should be one MLP behind the target network in the paper.
And in the code, online network has one MLP. But target network has no MLP. Besides, There are also parameter updates between the MLP of the online network and the target network. Is there an oversight here? Or I didn't see it
First of all, thank you so much for making this work open source!
There should be two MLP behind the online network in the paper, and there should be one MLP behind the target network in the paper. And in the code, online network has one MLP. But target network has no MLP. Besides, There are also parameter updates between the MLP of the online network and the target network. Is there an oversight here? Or I didn't see it