Closed mfe7 closed 4 years ago
Doing a little more digging, the issue seems to be a result of some episodes ending earlier than others in my environment. I added a few lines to fit
to 1) remove the zero observations/returns from featmat/returns before creating the XTX, XTy matrices
flat_mask = episodes.mask.flatten()
featmat = featmat[torch.nonzero(flat_mask)].view(-1, self.feature_size)
returns = returns[torch.nonzero(flat_mask)].view(-1, 1)
and 2) increase the reg_coeff
when lstsq
returns coeffs with either nan
of inf
:
if torch.isnan(coeffs).any() or torch.isinf(coeffs).any():
raise RuntimeError
If this seems like a reasonable way of doing this, I can submit a pull request -- otherwise open to other ways of solving this more intelligently
Thank you for the bug report! I think removing the masked entries in observations ans returns is the correct way of doing it, and your changes look reasonable. A PR would be very appreciated, thank you!
Hi -- I have a custom gym environment that outputs observations that (should) never contain all zeros, but sometimes when I print out
episodes.observations
the first several rows contain reasonable observation vectors and the last several rows contain zero vectors. I am guessing themask
attribute is related to this?The issue is that having a bunch of zero rows seems to make the matrix inversion in
baseline.fit
difficult and it returns an error. I'm wondering if you have any advice on where the zero vectors might be coming from and what to do to make thefit
function work in their presence (maybe ignoring those rows?). Thanks!