Questions about the hyper-parameters for LDAM loss

hyungwonchoi commented 5 years ago

It was a very interesting paper to read :)

I have some questions regarding the hyper-parameters for LDAM loss.

What is the values of C, the hyper-parameter to be tuned (according to the paper)? Is it (max_m / np.max(m_list)) introduced in below? https://github.com/kaidic/LDAM-DRW/blob/master/losses.py#L28
Is s=30 in LDAM loss also a hyper-parameter to be tuned? I could not find any explanation in the paper. Did I miss something?
What were the tendency of these hyper-parameters when training? How do these hyper-parameter selections are related to the imbalance level (or different datasets)? The found parameters work for other datasets in the paper (Tiny ImageNet, iNaturalist)?

Thanks.

kaidic commented 5 years ago

Thanks for your interest in our paper. I'll briefly answer based on my understanding.

Right.
Nope you don't have to. s is pretty robust here. You could try 10 it works pretty much the same. It's pretty common to introduce a scalar if the input of cross entropy is are normalized.
I think max_m is a hype-rparameter that requires tuning. Basically we want the max_m to be as large as possible while it doesn't incur under-fitting. I find 0.5 works universally well for small datasets. As the iNaturalist, 0.3 suffices and it seems that 0.5 is too large.

chuong98 commented 5 years ago

Hi, it seems to me that parameter s is the temperature for Softmax. Did you try with s=1 by any chance? Thanks.

kaidic commented 5 years ago

s = 1 will incur under-fitting. The reason behind it is that even when the logits looks like [1, -1, -1, ...], after softmax the true class's probability can not get close to 0.99.

kaidic / LDAM-DRW

Questions about the hyper-parameters for LDAM loss #2