Closed roomo7time closed 1 month ago
Than you for sharing!
How did you modify the dataloader. Till now, I have never met NaN with different hyperparameters on UAD datasets. The "unstability" I refered to was simply loss spikes, which converge in the end. Also, changing eps to a larger number is reasonable.
The StableAdamW used is customized by myself from AdamW according to the paper. Currently, there is some other unofficial repo of StableAdamW, e.g. https://optimi.benjaminwarner.dev/optimizers/stableadamw/.
The whole "training unstability" is not only caused by the optimizer, but by a lot of complicated interactional factors. E.g., I found that loose loss(hard minning by detaching gradients) can somtimes cause unstability. Therefore, we did not fully detached the gradients but reduce it instead.
Thank you for your reply! The dataset I used is the noisy version of the MVTecAd data used in SoftPatch. After I changed the optimizer to AdamW and increased the epsilon=1e-5, I am not getting the inf/nan issues for the loss. Thanks!!!
First of all, thank you for your amazing work and sharing the code!!
I've slightly modified dataloader, and training with the default setting.
Unfortunately, StableAdamW is not really stable with the default configuration:
I have changed the default
eps=1e-8
to1e-6
. Now, I am not getting nans, but still a bit concerned.If okay, may I ask the source code of your StableAdamW implementation? So far as I know, the official code of StableAdamW is not available.