Closed nelaturuharsha closed 1 year ago
Hi @SreeHarshaNelaturu ! What training code are you using?
Hi @andrewilyas thanks for the prompt response. This a custom harness we wrote on our own for training + pruning networks. Could you let me know any specific aspects I could send across?
Btw, I found one interesting result that seems to have fixed the problem almost -- that is adding "shuffled_indices=True" while regenerating the beton.
The loss/accuracy curves now look like this
Blue - FFCV (w/o shuffle_indices = True while creating the beton) Orange - PyTorch DataLoader maroon - FFCV (w/ shuffle indices = True while creating the beton)
Hope this could help people out, I think this is related to usage of OrderOption.QUASI_RANDOM
as indicated in issue #304 .
I also tested this on the CelebA dataset, and could provide code to reproduce -- there was little or no speedup achieved due to use of FFCV and there are very little throughput gains using FFCV (38 mins v/s 41 mins) on ImageNet [Device: NVIDIA A6000]
Is this speedup only to be expected on mixed-precision training?
(Please let me know if its better to open a different issue for the speedup related component)
Cheers!
Hello,
Training a ResNet50 on ImageNet for a project and noticed the following issues:
For some context
As you can see above, the performance is far worse when FFCV is used.
Would appreciate any insight into why this is happening and what could be done to improve.
Thanks!