I have noticed that in your paper you reported the result of predicting future motions given multiple action labels (e.g., Figure (4) 2nd row). I wonder how is this part of the result produced, since I have not found the corresponding code.
Is this achieved by simply generating a sequence in a two-part manner, and use the last few frames in the first part together with a different action label to generate the second part?
like this:
action label1, observed sequence ===[VAE Decoder]===> Prediction 1
action label2, Prediction 1(maybe last few frames) ===[VAE Decoder]===> Prediction 2
and Prediction 1+Prediction 2 is the final result?
I have noticed that in your paper you reported the result of predicting future motions given multiple action labels (e.g., Figure (4) 2nd row). I wonder how is this part of the result produced, since I have not found the corresponding code. Is this achieved by simply generating a sequence in a two-part manner, and use the last few frames in the first part together with a different action label to generate the second part?
like this: action label1, observed sequence ===[VAE Decoder]===> Prediction 1 action label2, Prediction 1(maybe last few frames) ===[VAE Decoder]===> Prediction 2 and Prediction 1+Prediction 2 is the final result?
Looking forward to your reply! Thanks