Closed patrisaru closed 4 years ago
Please refer to paper Figure 4. In model-based version, the input of critic includes rendered image and target image. In model-free version, the input of critic includes current canvas, target image and the action.
And if the reward is given by the discriminator, what is the critic output?. What is the different between the critic output and discriminator output if both are reward? I am looking paper Figure 2, but I don't understand. Sorry I am a beginner in Deep Reinforcement Learning . thank you so much again.
The critic predict the expected accumulated rewards after this step. Note the discriminator just gives reward for this single step. Feel free to ask me questions :)
Thank you very much for your help again. Also, the actor produces five strokes in each step. The discriminator gives a reward for each strokes generated and after this five strokes in this step, the critic give the accumulated rewards. am I understanding the algorithm right? thank you :)
A litte detail: the critic predict the expected reward in the future (until this episode over, contain this single step reward).
Great! Finally what is Q(s, a) in Figura 2 b? thanks for your help!
The expected reward after perform the action a in the state s (contain this single step reward). :)
Thanks for all! In the code, on update_gan why canvas = state[:, :3] and gt = state[:, 3 : 6]? what contains exactly the variable state?
https://github.com/megvii-research/ICCV2019-LearningToPaint/blob/83dc2b6129feeeb56a0cb1fd91d3ffdb9d288616/baseline/env.py#L87 Canvas, Target Image and #Step.
Hi! I am trying to understand the Deep Reinforcement Learning part. I know that the actor outputs is a set of stroke parameters based on the canvas status and target image and the discriminator give (to the actor) a reward at each step . But what about critic? What is the input and the output for the actor? I am reading the paper but I do not understand this part. thank you so much