ikostrikov / pytorch-a3c

PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".
MIT License
1.23k stars 279 forks source link

Reward Smoothing #63

Closed WangChen100 closed 5 years ago

WangChen100 commented 5 years ago

Hi, How do you think about reward smoothing. The collected rewards have high variance. In order to show the tendency of reward curve, should we do some reward smoothing operation as same as tensorboard smoothing? If so, which smoothing method should I choose, exponential smoothing or average smoothing?