mbuckler / youtube-bb

Public repo for helpful scripts when using the YouTube Bounding Boxes dataset
MIT License
193 stars 57 forks source link

how many images in youtubebbdevkit2017 after voc_convert.py? #18

Closed yiminglin-ai closed 5 years ago

yiminglin-ai commented 5 years ago

Hi Mark, I have been running the voc_convert.py for few days and there are 587703 images now. How many in total should I expect? Best, Yiming

shallowtoil commented 5 years ago

Hi Mark, I have been running the voc_convert.py for few days and there are 587703 images now. How many in total should I expect? Best, Yiming

Hi, I'm also kinda in trouble dealing with voc_convert.py for few days. I wonder if you could share the cropped dataset when done? Many thanks.

mbuckler commented 5 years ago

For an answer to your specific question about the total number of images in the dataset, there are approximately 10 million (there are that many annotations): https://research.google.com/youtube-bb/

voc_convert does have some issues with speed when attempting to convert the entire dataset at once. I would recommend editing the csv files to work with a subset of the dataset instead. For more details see here: https://github.com/mbuckler/youtube-bb/issues/14

If folks are interested in just the decoded images (not the video data) then they may be interested in a modified version of this repo which downloads, decodes, and then deletes the videos: https://github.com/mehdi-shiba/youtube-bb-utility