Open JJumSSu opened 1 year ago
Hi @JJumSSu. Re. 1, all shards are shuffled. Re. 2, the advantage here is that it allows us to save checkpoints more frequently (at fractions of an epoch) by setting --train-num-samples to a lower value. This is important for larger datasets
Hi, thank you for the amazing repo!
I'm currently trying to train a CLIP model using multiple datasets in a webdataset format. While doing so, I have some questions regarding the shuffling.
--dataset-resampled
shuffles the shards with replacement. So does it mean that some instances will be trained more than one time and some of them will not be trained at all? If so, what is the advantage of using the parameter?Thank you :)