mosaicml / composer

Supercharge Your Model Training
http://docs.mosaicml.com
Apache License 2.0
5.12k stars 413 forks source link

Synthetic data has slower throughput than imagenet #148

Closed growlix closed 2 years ago

growlix commented 2 years ago

Environment

** To reproduce

Steps to reproduce the behavior:

Compare running with composer/yamls/models/resnet50_synthetic.yaml to composer/yamls/models/resnet50.yaml. See W&B runs here.

Expected behavior

Using synthetic data should be at least as fast, if not faster than using actual data. Issue is slightly improved by increasing train_dataset.synthetic.total_dataset_size: 12288 to something larger (e.g. 100000).

ravi-mosaicml commented 2 years ago

I re-ran an experiment using a newer docker image and the latest dev branch of composer. Specifically:

Both Synthetic Data and Real data got around 4900 img / second.

I suppose this regression was caused by:

Closing this issue as it will go away with the next release of composer.