Open roggirg opened 3 years ago
Hi,
I believe the reward loss should be based on rewards[1:] instead of rewards[:-1]: https://github.com/yusukeurakami/dreamer-pytorch/blob/7e9050e8c454309de40bd0d1a4ec0256ef600147/main.py#L209
rewards[1:]
rewards[:-1]
If not, can you please explain your reasoning? Thanks,
Hi,
I believe the reward loss should be based on
rewards[1:]
instead ofrewards[:-1]
: https://github.com/yusukeurakami/dreamer-pytorch/blob/7e9050e8c454309de40bd0d1a4ec0256ef600147/main.py#L209If not, can you please explain your reasoning? Thanks,