libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.82k stars 178 forks source link

Working with large dataset on Multi-GPU - Questions about limitations #130

Closed kgonia closed 2 years ago

kgonia commented 2 years ago

Since QUASI_RANDOM isn’t currently with distributed=True if I have large dataset on I should have machine that is able to cache whole dataset in memory?

Is it make sense to put ToDevice() method in loading pipeline if I have several GPUs?

GuillaumeLeclerc commented 2 years ago

Hello @kgonia!

For now yeah if want to train in a distributed fashion it's best if your dataset fits in memory. I hope that we can add support for that soon but I'm currently focusing other more pressing features.

Yeah it makes perfect sense to put the ToDevice if you have multiple GPUs since you should have one pipeline per GPU anyway. (Don't forget to call ch.cuda.set_device). You should take a look at our ImageNet example it uses distributed and should follow all the best practices.

Follow issue #92 to see when we finally add the feature. In the meantime I'm going to close this issue to avoid having duplicates but feel free to reopen it if needed