A bug in the implementation of a mathematical formula

G1NO3 commented 5 days ago

In the method of SPEDERSACAgent.feature_step in spedersac_agent.py, the formula "model_loss_pt1 = -2 z_phi @ z_mu_next.T" should be written instead as "model_loss_pt1 = -2 torch.diag(z_phi @ z_mu_next.T)". This is because in the formula of the original paper, this term includes the sampling s' from P(s'|s,a). Given that z_phi and z_mu are two matrices calculated from (s,a,s') pair from the replay buffer, then only the i-th row in z_phi and i-th row in z_mu should be multiplied together to give out the result. In other word, only the trace of z_phi @ z_mu_next.T should be summed up, instead of all elements in your implementation.

haotiansun14 commented 4 days ago

Hi @dmitryshribak, could you take a look at this issue? Thanks.

dmitryshribak commented 1 hour ago

Created pull request to fix.

haotiansun14 / rl-rep

A bug in the implementation of a mathematical formula #8