StableAdamW is not really stable

roomo7time commented 2 months ago

First of all, thank you for your amazing work and sharing the code!!

I've slightly modified dataloader, and training with the default setting.

Unfortunately, StableAdamW is not really stable with the default configuration:

I have changed the default eps=1e-8 to 1e-6. Now, I am not getting nans, but still a bit concerned.

If okay, may I ask the source code of your StableAdamW implementation? So far as I know, the official code of StableAdamW is not available.

guojiajeremy commented 2 months ago

Than you for sharing!

How did you modify the dataloader. Till now, I have never met NaN with different hyperparameters on UAD datasets. The "unstability" I refered to was simply loss spikes, which converge in the end. Also, changing eps to a larger number is reasonable.

The StableAdamW used is customized by myself from AdamW according to the paper. Currently, there is some other unofficial repo of StableAdamW, e.g. https://optimi.benjaminwarner.dev/optimizers/stableadamw/.

The whole "training unstability" is not only caused by the optimizer, but by a lot of complicated interactional factors. E.g., I found that loose loss(hard minning by detaching gradients) can somtimes cause unstability. Therefore, we did not fully detached the gradients but reduce it instead.

roomo7time commented 1 month ago

Thank you for your reply! The dataset I used is the noisy version of the MVTecAd data used in SoftPatch. After I changed the optimizer to AdamW and increased the epsilon=1e-5, I am not getting the inf/nan issues for the loss. Thanks!!!

guojiajeremy / Dinomaly

StableAdamW is not really stable #8