wisdomikezogwo / quilt1m

[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.
https://quilt1m.github.io/
MIT License
138 stars 8 forks source link

Downloaded Dataset Size #8

Closed eemokey closed 1 year ago

eemokey commented 1 year ago

Hi!

wisdomikezogwo commented 1 year ago

Hi,

1) After compression the size of the images (extracted at the highest frame resolution for each video) should be ~60GB 2) Unfortunately we haven't benchmarked how long it takes to run this on all videos, but if the initial process is anything to go by, it might take a while to compute with the scene extractions (on CPU) being the bottleneck, I'd say my best estimate would be a 400hrs on a single CPU+GPU node however with chunking this can be reduced even more.

I hope this answers your question, do let us know if you have any more.