mosaicml / composer

Supercharge Your Model Training
http://docs.mosaicml.com
Apache License 2.0
5.12k stars 413 forks source link

Performance Issues #809

Closed tcapelle closed 2 years ago

tcapelle commented 2 years ago

Hello, great work!

I am experimenting with composer to benchmark on the Imagenette dataset and I am having problems getting good performance.

I put a colab with my experimentations here: https://github.com/tcapelle/mosaic/blob/main/Benchmark_composer.ipynb

and the logs: https://wandb.ai/capecape/composer?workspace=user-capecape

I have some questions:

Sincerely, Thomas

hanlint commented 2 years ago

Thanks for working on this notebook, and filing the issue, we are taking a look at this!

The Trainer accepts any pytorch scheduler, so you could use OneCycleLR from pytorch (https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.OneCycleLR.html). But we will try to reproduce your results and match the fast.ai hyperparameters.

tcapelle commented 2 years ago

To add more context, the loss curves compared to fastai:

image

Blue is fastai, others are different try of composer

coryMosaicML commented 2 years ago

It looks like the cause for the low accuracy here is a combination of a bad hyperparameter default + some missing documentation. When using DecoupledAdamW vs. regular Adam, the value of weight decay should be rescaled by the learning rate (if one uses weight_decay=0.01 with Adam, one should use weight_decay=0.01 * learning_rate with DecoupledAdamW). Unfortunately this isn't documented at the moment.

In your notebook's settings, it looks like you're using learning_rate=1e-3 and weight_decay=0.01, which implies for DecoupledAdamW you should use weight_decay=1e-5 to get similar behavior. DecoupledAdamW's default of weight_decay=0.01 winds up being several orders of magnitude too high.

Try using DecoupledAdamW with the rescaled weight decay as follows: dadam = DecoupledAdamW(model.parameters(), lr=lr, weight_decay=1e-5) While keeping the same cosine annealing and label smoothing settings. cosine_annel = CosineAnnealingWithWarmupScheduler('1ep', '1dur') algorithms=[LabelSmoothing()]

With this change, I see an accuracy of ~96% after 10 epochs. Note that the training loss will likely be higher than your fast.ai result due to label smoothing's regularizing effect.

tcapelle commented 2 years ago

Thanks for the tips, indeed it gets better!

Indeed fastai does this under the hood. (the default is decoupled Adam). Multiplying the value you set as wd by lr. Good defaults are essential to users.

I am pushing an example in our wandb/examples repo for composer, hope you project launch to the stars (GitHub ones).

You can take a look here: https://github.com/wandb/examples/pull/216

Update: It is indeed getting impressive results! (Data leak) But I am curious why it is showing more steps (looks that it does one extra epoch.

Thomas

tcapelle commented 2 years ago

Hello again,

I had data leakage on the composer benchmark. The correct results are here:

image

These results looks reasonable.

coryMosaicML commented 2 years ago

Great! And your point about good defaults is well taken. We will work to improve that!

Was the data leakage caused by composer? Or, something unrelated?

tcapelle commented 2 years ago

No it was my Imagenette dataset that was leaking 😱. We could improve the naming of the wandb logging variables (we get a bunch of panes with weird names). I will wait your next release to take a look. Also you are not logging anything to the config, you can pull a bunch of data from the Trainer and the ComposerModel. Look at all that fastai get's you for free: batch_size, epochs, loss_func, metrics_names, callbacks, loggers, flags, etc...

hanlint commented 2 years ago

Thanks @tcapelle for the feedback, and agreed we can improve our wandb logging experience. We've created an issue to track this (https://github.com/mosaicml/composer/issues/826).