About prediction with multiple action labels

I have noticed that in your paper you reported the result of predicting future motions given multiple action labels (e.g., Figure (4) 2nd row). I wonder how is this part of the result produced, since I have not found the corresponding code. Is this achieved by simply generating a sequence in a two-part manner, and use the last few frames in the first part together with a different action label to generate the second part?

like this: action label1, observed sequence ===[VAE Decoder]===> Prediction 1 action label2, Prediction 1(maybe last few frames) ===[VAE Decoder]===> Prediction 2 and Prediction 1+Prediction 2 is the final result?

Looking forward to your reply! Thanks

wei-mao-2019 / WAT

About prediction with multiple action labels #8