Thank you for your codebase! It is great. I believe the number of frames in a clip for the training and inference should be the same, say a clip of 16? What if we want to generate a video of length 32? Could we generate a clip of 16 frames and then the following clip of 16 frames? If so, how could we condition on the first clip to generate the second clip?
Thank you for your codebase! It is great. I believe the number of frames in a clip for the training and inference should be the same, say a clip of 16? What if we want to generate a video of length 32? Could we generate a clip of 16 frames and then the following clip of 16 frames? If so, how could we condition on the first clip to generate the second clip?