kalelpark / RAL

Robust Asymmetric Loss for Multi-Label Long-Tailed Learning
https://arxiv.org/abs/2308.05542
16 stars 0 forks source link

Can we use this loss function in Accelerate Library: Training loss grows exponentially and stays forever #4

Closed gssriram closed 3 weeks ago

gssriram commented 3 weeks ago

Hi,

Thanks for this innovative work and for making it available open-source. While using this loss function, I am seeing a sudden exponential spike in my loss value and there is no signs of coming down. Due to this, my macro AUROC score is very low (lesser than 0.5). I tried the training with the same setup with BCE loss. While doing so, I got macro AUROC of 0.6. Please advise, and your suggestions will help me in this regard.

image

Brief code:

model = ResNet50( ) model = nn.SyncBatchNorm.convert_sync_batchnorm(model) optimiser = torch.optim.SGD(model.parameters()) criterion = Ralloss() accelerator = Accelerator() model, optimiser, trainloader = accelerator.prepare(model, optimiser, trainloader)

for epoch in range(100): training_loss = AverageMeter('loss', ':.4f') for i, batch in enumerate(trainloader): img, label = batch[0], batch[1] y_pred = model(img) loss = criterion(y_pred, label.float()) training_loss.update(loss.item(), batch[0].size(0)) optimiser.zero_grad() accelerator.backward(loss) optimiser.step()

kalelpark commented 3 weeks ago

Thank you for being so interested in our work! In my experience, the RAL loss can adapt to your work. Our RAL Loss has many parameters, making it quite sensitive to hyper-parameters.” I highly recommend adjusting the hyper-parameters to help resolve the issue. I hope this helps.

If you have any questions, feel free to ask me. :)

gssriram commented 2 weeks ago

Hi @kalelpark,

Thanks for your insights. After spending time with my code, I found that the learning rate scheduler was causing abnormal behaviour in RAL.

Another fundamental question: If my dataset has more positives than negatives, can we interchange the gamma_neg and gamma_posvalues? Are there any further changes we need to make to calculate losses correctly? [One simple but manual workaround I feel is manually interchanging labels in the original dataset 😃] But I would like to know if there is any systematic way in RAL that takes care of this particular case.

Thanks!

EDIT:

Also, please tell the significance of batch_size and Learning rate on RAL. When I tested on a subset of my training data with size = 800 images, I observed the behaviour: a) Batch_size = {64,128}, LR=1e-3, normal behaviour of RAL b) Batch_size > 128 {256,512, etc.}, large LR={0.1,10,20,30}, abnormal behaviour of RAL (as in picture shown in my initial post).