libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.82k stars 178 forks source link

change indices argument during training? #152

Closed rraju1 closed 2 years ago

rraju1 commented 2 years ago

Hi,

Thanks for the amazing work. My question is kind of basic. I want to know if I can change the indices the loader accesses during training. I'll try to explain through the use of an example. Suppose my dataset is [1..100] and I want to train on the set [0..49] on one epoch and on the set [50..100] for the subsequent epoch and I want to alternate between these two sets. In Pytorch, I can achieve this by changing the set of indices to be sampled with the SubsetRandomSampler class. Can I do something similar with ffcv or do I have to recompile my dataloader every epoch?

GuillaumeLeclerc commented 2 years ago

@rraju1 there is a parameter indices in the constructor of the Loader. It's not technically part of the API but you could just update loader.indices = SET_1 or SET_2 before you start each epoch. You shouldn't incur any performance penalty.

Hope it helps!

(feel free to reopen if it doesn't work for you)

rraju1 commented 2 years ago

@GuillaumeLeclerc I tried your suggestion but it didn't update number of batches the network was processing (going from full set to a subset). But when I changed train_loader.indices = SET1 and train_loader.traversal_order.indices = SET1 together, it seems to work (the number of batches in an epoch change). Thanks!

GuillaumeLeclerc commented 2 years ago

Happy that it worked for you. Enjoy FFCV!

hlzhang109 commented 2 years ago

I found it didn't work when I change the indices of a dataloader using mask = np.arange(1000) dataloader.indices = mask It still gives the original dataloader with 50,000 training data points in make_dataloaders() https://github.com/libffcv/ffcv/blob/bf07b5c6ac7f4cce788348d1718c15e78da8ae9d/examples/cifar/train_cifar.py#L69

lucasresck commented 10 months ago

when I changed train_loader.indices = SET1 and train_loader.traversal_order.indices = SET1 together, it seems to work (the number of batches in an epoch change)

Thank you, @rraju1! It seems to work here too.