Open engindeniz opened 2 years ago
Hi there,
Thanks for your interests in this project. For TVC, we simply concatenate the released 3FPS frames as video via ffmpeg. From there, we extracted 32/48/64 frames to construct the frame tsvs for training and inference.
Our end2end pipeline is for general purpose to support on-the-fly decoding for captioning tasks similar to TVC. In our implementation, we take the frame tsvs as input for training and testing when evaluating on TVC.
Hi,
Thanks for the great work and publicly available code.
For the TVC dataset, 3 FPS video frames are provided officially due to copyright issues. According to your code, it seems that you use videos from the TVC dataset. I am wondering how did you obtain the videos?
Thanks in advance.