nchucvml / STVT

Video Summarization With Spatiotemporal Vision Transformer
Apache License 2.0
18 stars 7 forks source link

canoncial, augmented settings #5

Closed ara-47 closed 2 months ago

ara-47 commented 3 months ago

how did you split the dataset for canonical and augmented settings as you have different dataset files to load the data than others?

nchucvml commented 3 months ago

As described in [18], [19], [20], and [21], two additional datasets OVP and YouTube datasets [70] are served as the augmented datasets with the SumMe and TVSum datasets for augmented experiments. For canonical settings, we using only SumMe and TVSum datasets.

ara-47 commented 3 months ago

Understood. But my point of the question was different. I am sorry if it wasn't clear enough. How/which file have you used (JSON/yml) to split the datasets into 5 splits? if you have used any of them (JSON/yml) how did you pass to the dataloaders? Moreover, for augmented and transfer settings how did you combine/augment the YouTube and OVP with TVSum and SumMe as you have different dataloaders files such as TVSum.py and SumMe.py which takes only one h5 file at a time? Would be grateful if you could clarify this. Thank You.

nchucvml commented 2 months ago

We provide the whole dataset. Please load the whole dataset and split the dataset to .h5 files. Then you can load the split .h5 files. For augmented and transfer settings, you need to combine the data from different datasets to .h5 files and use the dataloader to load the .h5 files. In our code, we only provide to load a .h5 file for demonstration.