tristandeleu / pytorch-maml-rl

Reinforcement Learning with Model-Agnostic Meta-Learning in Pytorch
MIT License
827 stars 158 forks source link

Custom environment and baseline.fit(episodes) error #48

Closed mfe7 closed 4 years ago

mfe7 commented 4 years ago

Hi -- I have a custom gym environment that outputs observations that (should) never contain all zeros, but sometimes when I print out episodes.observations the first several rows contain reasonable observation vectors and the last several rows contain zero vectors. I am guessing the mask attribute is related to this?

The issue is that having a bunch of zero rows seems to make the matrix inversion in baseline.fit difficult and it returns an error. I'm wondering if you have any advice on where the zero vectors might be coming from and what to do to make the fit function work in their presence (maybe ignoring those rows?). Thanks!

mfe7 commented 4 years ago

Doing a little more digging, the issue seems to be a result of some episodes ending earlier than others in my environment. I added a few lines to fitto 1) remove the zero observations/returns from featmat/returns before creating the XTX, XTy matrices

flat_mask = episodes.mask.flatten()
featmat = featmat[torch.nonzero(flat_mask)].view(-1, self.feature_size)
returns = returns[torch.nonzero(flat_mask)].view(-1, 1)

and 2) increase the reg_coeff when lstsq returns coeffs with either nan of inf:

if torch.isnan(coeffs).any() or torch.isinf(coeffs).any():
    raise RuntimeError

If this seems like a reasonable way of doing this, I can submit a pull request -- otherwise open to other ways of solving this more intelligently

tristandeleu commented 4 years ago

Thank you for the bug report! I think removing the masked entries in observations ans returns is the correct way of doing it, and your changes look reasonable. A PR would be very appreciated, thank you!