mbuckler / youtube-bb

Public repo for helpful scripts when using the YouTube Bounding Boxes dataset
MIT License
193 stars 57 forks source link

the running speed of voc_convert.py #14

Closed jacobssy closed 6 years ago

jacobssy commented 6 years ago

Hi, your repo is good , and I hava downloaded the youtube data, now I wanna decode them into the voc training data, but I found the running speed is so slow,and it also has a bad case:

  1. We should use loop to handle all data in train.csv, it will cost a lot of time
  2. after handling all data in train.csv, we get the "present_annots" variable, and to decode frame, but it will break because of the [error 26]Too many open files: '/dev/null'

is there any good solution to avoid this ? only try more threads to speed up? I have used 64 threads... but it is also slow...

mbuckler commented 6 years ago

Hi @jacobssy,

Yes, there does seem to be an issue when using the voc_convert.py script with the full list of videos. When I used the script I edited yt_bb_detection_train.csv and yt_bb_detection_validation.csv to be 1/25th of their length, and at that scale the script worked well. I've taken a look and everything seems to be reasonable so I'm not too sure why the script is slow when working with the full list of videos, but feel free to suggest changes or make a pull request.

mbuckler commented 6 years ago

Closing for inactivity