hzwer / ICCV2019-LearningToPaint

ICCV2019 - Learning to Paint With Model-based Deep Reinforcement Learning
MIT License
2.25k stars 312 forks source link

Question about reward #55

Closed aemior closed 3 years ago

aemior commented 3 years ago

I find the reward save to Replay buffer https://github.com/megvii-research/ICCV2019-LearningToPaint/blob/24e317ba1d7c88435677fc77cb2ded6d03b2a914/baseline/env.py#L105 is different from the reward calculate in training process https://github.com/megvii-research/ICCV2019-LearningToPaint/blob/24e317ba1d7c88435677fc77cb2ded6d03b2a914/baseline/DRL/ddpg.py#L102 ,one is divide by initial distance and one is not, is it a bug? or it's just ok

hzwer commented 3 years ago

Hi! During model training, the reward from environment is not used. This "reward" is just used for observing the training process.

aemior commented 3 years ago

Thanks, I got it.