m-bain / webvid

Large-scale text-video dataset. 10 million captioned short videos.
574 stars 35 forks source link

About some of the video durations being incorrectly labeled. #16

Open xiefan233 opened 1 year ago

xiefan233 commented 1 year ago

Seven of the WebVid-2.5M training sets are longer than one hour, but when I found the specific videos to watch, I found that the actual length of the videos was not consistent with the label. For example:

25677788, Sailboats on the Horizon, 90507090, 501 _090550 PT01H02M37S, https://ak.picdn.net/shutterstock/videos/25677788/preview/stock-footage-sailboat s-on-the-horizon.mp4

The real duration of the video is 11S, which is not consistent with the label. Several other tags have the same problem. image

m-bain commented 1 year ago

I see, thanks for reporting. The duration was actually mined from the source site (not measured with ffmpeg or the like) so could be prone to discrepancies