yuanming-hu / exposure

Learning infinite-resolution image processing with GAN and RL from unpaired image datasets, using a differentiable photo editing model.
MIT License
767 stars 156 forks source link

Question about value network #45

Open yuke93 opened 5 years ago

yuke93 commented 5 years ago

Hi Yuanming,

Thanks for releasing codes of this wonderful project!

I have a question about the value network. In net.py, the new_value is predicted by observing fake_output and new_states. Let s_t denote fake_input, and then fake_output is s_{t+1}. The new_states contain the ation a_t that transfers s_t to s_{t+1}. Therefore, it seems the codes are predicting Q(s_t, a_{t-1}), Q(s_{t+1}, a_t) rather than Q(s_t, a_t), Q(s_{t+1}, a_{t+1}). If so, I am confused how the policy gradients are calculated (e.g., Eqn. (7) in the paper). I might get something wrong. I'd appreciate it if you could help me clarify this question. Thanks!

Yu Ke