Closed MertEnesYurtseven closed 2 years ago
Hi Mert! Thank you for your interests!
1-) The data.py file supports different formats of videos, say frames in image format or videos in video format. The simplest to include text part instead of adapting the entire CoinRun dataloader, which is very specially designed for MUGEN dataset, is to use this preprocess function after load paired text for each video. 2-) You can follow the instruction in the train section. Basically first train a VQGAN and then a transformer. 3-) Personally I don't recommend training with your own deceive since the training may mess it up. In addition, most of the models are trained with 8 V100 for a larger batch size, which is often quite important.
Thank you for your reply
I really apreciated the work you done in this repo. I have a custom dataset which has mp4 videos and txt text descriptions by timestamp (300 GB zip). How can I use my custom data to train a text2video generator. Main theme of my data is nearly same as MUGEN dataset which has subjects, actions, objects...
1-) How I should pre process my data (such as a csv file points mp4 and coressponding txt or a json file and other options) 2-) How exactly I can train the tats and vqgan models, is there any official scripts? If not, can you help me with it? 3-) This data is huge and I have just a laptop with nvidia 3080-q max laptop gpu, do you think it is possible to train a good model in a meaningful time or should I search for a service like aws. (3080 is 4 times better than a tesla k80 gpu but the real deal is number of gpus in cases like that)
Thank you for answers!