iejMac / video2dataset

Easily create large video dataset from video urls
MIT License
533 stars 65 forks source link

Download worker refactor #288

Closed MattUnderscoreZhang closed 8 months ago

MattUnderscoreZhang commented 8 months ago

Unifies SubsetWorker and DownloadWorker logic. The two classes can probably be combined at this point.

Merge after #287.

rom1504 commented 8 months ago

Can you rebase please ?

MattUnderscoreZhang commented 8 months ago

Ok, done

MattUnderscoreZhang commented 8 months ago

To describe high-level what happened here, I took overlapping logic from subset_worker and download_worker, and moved them to a common file worker.py. I did try to fully combine the two workers into a single class, but was unable to do so right now due to differences in how they load input data.

The main reason I did this was because both workers use the video processing subsamplers, which I'm trying to rewrite to not use temp file read/writes. By putting the common video processing logic in a single file, I can reduce the number of changes I have to track later.

I have tested both subset_worker and download_worker on a subset of webvid, using my usual workflow with a framerate resampling, resizing, cropping, padding, clip finding, and cutting. I did this both as a single-step download+process, and with a download + subsequent process. Both methods replicated previous results, showing that this refactor should have no functional effect.

rom1504 commented 8 months ago

looks pretty good to me, let's merge