Closed clvoloshin closed 5 years ago
On second thought, Algorithm 1 from the appendix actually takes the form D(w) = x^T K x, but this isn't what the code is doing: the code takes the form self.loss_xx = D(w) = x^T K y. Please advise.
Good point and sorry for misleading here. Yes you're right, we are trying to implement a more general framework here if we want to try double sampling methods, which may result in negative loss. Typically (which is also the result in our paper) we use V-statistic to use the same batch of samples to estimate the quadratic loss. You can change in train function to use the same subsample to feed in the data.
Right -- so I just change the loss_xx to x^T K x, where K = [K(s'_i, s'_j)] where s_i,s_j come from the same (next) state. Great, thank you!
Hi! I have a question about how loss is defined.
In the paper, the loss takes the form D(w) = L^2 = E[ d(w,s,a,s') d(w1, s1,a1,s1') k(s,s') ]. In other words, it has the form E[x^T K y] for x=d(w,s,a,s') and y = d(w1,s1,a1,s1'). This means that x^T K y is always positive (since E[x^T K y] = L^2 > 0). However, empirically, when running the sumo code, i'm seeing negative values for the loss_xx. I'm very confused by this. Is this a bug or is negative loss allowed?
^ shows loss_xx and self.loss for a few epochs of training. Notice that the loss is negative in some cases.