hzwer / ICCV2019-LearningToPaint

ICCV2019 - Learning to Paint With Model-based Deep Reinforcement Learning
MIT License
2.25k stars 312 forks source link

Critic and discriminator #40

Closed patrisaru closed 4 years ago

patrisaru commented 4 years ago

Hi! I am trying to understand the Deep Reinforcement Learning part. I know that the actor outputs is a set of stroke parameters based on the canvas status and target image and the discriminator give (to the actor) a reward at each step . But what about critic? What is the input and the output for the actor? I am reading the paper but I do not understand this part. thank you so much

hzwer commented 4 years ago

Please refer to paper Figure 4. In model-based version, the input of critic includes rendered image and target image. In model-free version, the input of critic includes current canvas, target image and the action.

patrisaru commented 4 years ago

And if the reward is given by the discriminator, what is the critic output?. What is the different between the critic output and discriminator output if both are reward? I am looking paper Figure 2, but I don't understand. Sorry I am a beginner in Deep Reinforcement Learning . thank you so much again.

hzwer commented 4 years ago

The critic predict the expected accumulated rewards after this step. Note the discriminator just gives reward for this single step. Feel free to ask me questions :)

patrisaru commented 4 years ago

Thank you very much for your help again. Also, the actor produces five strokes in each step. The discriminator gives a reward for each strokes generated and after this five strokes in this step, the critic give the accumulated rewards. am I understanding the algorithm right? thank you :)

hzwer commented 4 years ago

A litte detail: the critic predict the expected reward in the future (until this episode over, contain this single step reward).

patrisaru commented 4 years ago

Great! Finally what is Q(s, a) in Figura 2 b? thanks for your help!

hzwer commented 4 years ago

The expected reward after perform the action a in the state s (contain this single step reward). :)

patrisaru commented 4 years ago

Thanks for all! In the code, on update_gan why canvas = state[:, :3] and gt = state[:, 3 : 6]? what contains exactly the variable state?

hzwer commented 4 years ago

https://github.com/megvii-research/ICCV2019-LearningToPaint/blob/83dc2b6129feeeb56a0cb1fd91d3ffdb9d288616/baseline/env.py#L87 Canvas, Target Image and #Step.