iejMac / video2dataset

Easily create large video dataset from video urls
MIT License
529 stars 63 forks source link

optimize video handling #60

Open rom1504 opened 1 year ago

rom1504 commented 1 year ago

currently everything is done in memory, full video if doing many in parallel (which may be needed due to slow connections of sources) then it takes a lot of memory

solutions

  1. store on disk instead
  2. process in streaming / in small clips

1 is easier but still limited by disk size 2 works in general but more difficult to implement

iejMac commented 1 year ago

let's implement streaming in v1, seems with the right configuration of thread_count, subjob_size, and sampler_per_shard this works fine