In the method of SPEDERSACAgent.feature_step in spedersac_agent.py, the formula "model_loss_pt1 = -2 z_phi @ z_mu_next.T" should be written instead as "model_loss_pt1 = -2 torch.diag(z_phi @ z_mu_next.T)". This is because in the formula of the original paper, this term includes the sampling s' from P(s'|s,a). Given that z_phi and z_mu are two matrices calculated from (s,a,s') pair from the replay buffer, then only the i-th row in z_phi and i-th row in z_mu should be multiplied together to give out the result. In other word, only the trace of z_phi @ z_mu_next.T should be summed up, instead of all elements in your implementation.
In the method of SPEDERSACAgent.feature_step in spedersac_agent.py, the formula "model_loss_pt1 = -2 z_phi @ z_mu_next.T" should be written instead as "model_loss_pt1 = -2 torch.diag(z_phi @ z_mu_next.T)". This is because in the formula of the original paper, this term includes the sampling s' from P(s'|s,a). Given that z_phi and z_mu are two matrices calculated from (s,a,s') pair from the replay buffer, then only the i-th row in z_phi and i-th row in z_mu should be multiplied together to give out the result. In other word, only the trace of z_phi @ z_mu_next.T should be summed up, instead of all elements in your implementation.