sebascuri / qreps

5 stars 4 forks source link

Confused about the Implementation of QREPS Agent #2

Open sebimarkgraf opened 3 years ago

sebimarkgraf commented 3 years ago

Hey, I am confused about the dual implementation in the QREPS agent. The code I am talking about is in qreps_algorithm in the QREPS class. To be specific:

 # Calculate weights.
weights_td = self.eta() * td  # type: torch.Tensor
if weights_td.ndim == 1:
    weights_td = weights_td.unsqueeze(-1)
dual = 1 / self.eta() * torch.logsumexp(weights_td, dim=-1)
dual += (1 - self.gamma) * value.squeeze(-1)
return Loss(dual_loss=dual.mean(), td_error=td)

As far as I understand, the last dimension in weights_td is always added and then the logsumexp operation does nothing. Maybe, you can help me in understanding this or maybe there are changes between the version of the paper and the implementation visible here.

The current implementation seems to perform only good with the fixed seed 0. When setting any other seed the learning breaks down completely.

I hope you can guide me in understanding this.

sebascuri commented 3 years ago

Oh yeah that is a big mistake. I will fix it right away.