Closed psalomonr closed 2 years ago
Hello, when using action bundle, for example k=5, which means we treat five strokes as one action. The critic also regards five strokes as a whole action to predict Q.
thanks so much ! and what about divide parameter? :)
@psalomonr We did not mention the divide parameter in this paper. It is just used to make the demo look better. Dividing the canvas is an artificial strategy introduced during the inference stage.
thanks so much for your answers @hzwer !
Hello :)
I have some doubts...
I have seen that in the algorithm a "divide" parameter is defined which divides the Canvas into mini canvas in order to improve the agent accuracy. But.... I would like to understand when this action is performed during the training (what are the steps). when the actor is going to make a stroke, the canvas is divided and then it is reconstructed?
Also I have seen that for each state the actor performs 5 actions (brush strokes), I understand that the discriminate gives the reward to the actor. But what about with respect to the critic? update q for each of the five actions?
Thank you very much in advance