Closed aemior closed 3 years ago
I find the reward save to Replay buffer https://github.com/megvii-research/ICCV2019-LearningToPaint/blob/24e317ba1d7c88435677fc77cb2ded6d03b2a914/baseline/env.py#L105 is different from the reward calculate in training process https://github.com/megvii-research/ICCV2019-LearningToPaint/blob/24e317ba1d7c88435677fc77cb2ded6d03b2a914/baseline/DRL/ddpg.py#L102 ,one is divide by initial distance and one is not, is it a bug? or it's just ok
Hi! During model training, the reward from environment is not used. This "reward" is just used for observing the training process.
Thanks, I got it.
I find the reward save to Replay buffer https://github.com/megvii-research/ICCV2019-LearningToPaint/blob/24e317ba1d7c88435677fc77cb2ded6d03b2a914/baseline/env.py#L105 is different from the reward calculate in training process https://github.com/megvii-research/ICCV2019-LearningToPaint/blob/24e317ba1d7c88435677fc77cb2ded6d03b2a914/baseline/DRL/ddpg.py#L102 ,one is divide by initial distance and one is not, is it a bug? or it's just ok