libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.81k stars 180 forks source link

Question about shuffling #178

Closed henryqin1997 closed 2 years ago

henryqin1997 commented 2 years ago

Hi, as posted in previous question asking why ffcv is fast, one point mentioned is that single file is used instead of many files, so it is faster to retrieve. I am curious about this: if single file is used, how does ffcv provide random shuffling?

GuillaumeLeclerc commented 2 years ago

Hello,

It is important to note that one doesn't have to read an entire file all at once. It is possible to only read regions you care about and FFCV heavily relies on that. To answer your question more precisely tt depends on the ordering:

Hope it helps!