Hi, How do you think about reward smoothing.
The collected rewards have high variance. In order to show the tendency of reward curve, should we do some reward smoothing operation as same as tensorboard smoothing?
If so, which smoothing method should I choose, exponential smoothing or average smoothing?
Hi, How do you think about reward smoothing. The collected rewards have high variance. In order to show the tendency of reward curve, should we do some reward smoothing operation as same as tensorboard smoothing? If so, which smoothing method should I choose, exponential smoothing or average smoothing?