for how long and how many videos should i train for good results?
As i tried to train it with just two 10 sec videos and the samples it is saving is just noise
As I understand from the paper for training C-ViViT - MiT dataset is used, for training phenaki what is the dataset used for text to video generation ?
for how long and how many videos should i train for good results? As i tried to train it with just two 10 sec videos and the samples it is saving is just noise