Closed tcapelle closed 2 years ago
Thanks for working on this notebook, and filing the issue, we are taking a look at this!
The Trainer accepts any pytorch scheduler, so you could use OneCycleLR from pytorch (https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.OneCycleLR.html). But we will try to reproduce your results and match the fast.ai hyperparameters.
To add more context, the loss curves compared to fastai:
Blue is fastai, others are different try of composer
It looks like the cause for the low accuracy here is a combination of a bad hyperparameter default + some missing documentation. When using DecoupledAdamW vs. regular Adam, the value of weight decay should be rescaled by the learning rate (if one uses weight_decay=0.01
with Adam, one should use weight_decay=0.01 * learning_rate
with DecoupledAdamW). Unfortunately this isn't documented at the moment.
In your notebook's settings, it looks like you're using learning_rate=1e-3
and weight_decay=0.01
, which implies for DecoupledAdamW you should use weight_decay=1e-5
to get similar behavior. DecoupledAdamW's default of weight_decay=0.01
winds up being several orders of magnitude too high.
Try using DecoupledAdamW with the rescaled weight decay as follows:
dadam = DecoupledAdamW(model.parameters(), lr=lr, weight_decay=1e-5)
While keeping the same cosine annealing and label smoothing settings.
cosine_annel = CosineAnnealingWithWarmupScheduler('1ep', '1dur')
algorithms=[LabelSmoothing()]
With this change, I see an accuracy of ~96% after 10 epochs. Note that the training loss will likely be higher than your fast.ai result due to label smoothing's regularizing effect.
Thanks for the tips, indeed it gets better!
Indeed fastai
does this under the hood. (the default is decoupled Adam). Multiplying the value you set as wd
by lr
. Good defaults are essential to users.
I am pushing an example in our wandb/examples
repo for composer, hope you project launch to the stars (GitHub ones).
You can take a look here: https://github.com/wandb/examples/pull/216
Update: It is indeed getting impressive results! (Data leak)
But I am curious why it is showing more steps (looks that it does one extra epoch.
Thomas
Hello again,
I had data leakage on the composer benchmark. The correct results are here:
fastai
is also using LabelSmoothingCE
These results looks reasonable.
Great! And your point about good defaults is well taken. We will work to improve that!
Was the data leakage caused by composer? Or, something unrelated?
No it was my Imagenette
dataset that was leaking 😱.
We could improve the naming of the wandb logging variables (we get a bunch of panes with weird names).
I will wait your next release to take a look. Also you are not logging anything to the config, you can pull a bunch of data from the Trainer
and the ComposerModel
. Look at all that fastai get's you for free: batch_size, epochs, loss_func, metrics_names, callbacks, loggers, flags, etc...
Thanks @tcapelle for the feedback, and agreed we can improve our wandb logging experience. We've created an issue to track this (https://github.com/mosaicml/composer/issues/826).
Hello, great work!
I am experimenting with composer to benchmark on the Imagenette dataset and I am having problems getting good performance.
I put a colab with my experimentations here: https://github.com/tcapelle/mosaic/blob/main/Benchmark_composer.ipynb
and the logs: https://wandb.ai/capecape/composer?workspace=user-capecape
I have some questions:
enumerate
on the eval_dataloader.fit_one_cycle
on fastai does this)with the following params:
logger.log_data
or should I do something else?Sincerely, Thomas