Closed hyungwonchoi closed 5 years ago
Thanks for your interest in our paper. I'll briefly answer based on my understanding.
Right.
Nope you don't have to. s is pretty robust here. You could try 10 it works pretty much the same. It's pretty common to introduce a scalar if the input of cross entropy is are normalized.
I think max_m is a hype-rparameter that requires tuning. Basically we want the max_m to be as large as possible while it doesn't incur under-fitting. I find 0.5 works universally well for small datasets. As the iNaturalist, 0.3 suffices and it seems that 0.5 is too large.
Hi, it seems to me that parameter s
is the temperature for Softmax. Did you try with s=1
by any chance?
Thanks.
s = 1 will incur under-fitting. The reason behind it is that even when the logits looks like [1, -1, -1, ...], after softmax the true class's probability can not get close to 0.99.
It was a very interesting paper to read :)
I have some questions regarding the hyper-parameters for LDAM loss.
What is the values of
C
, the hyper-parameter to be tuned (according to the paper)? Is it(max_m / np.max(m_list))
introduced in below? https://github.com/kaidic/LDAM-DRW/blob/master/losses.py#L28Is
s=30
in LDAM loss also a hyper-parameter to be tuned? I could not find any explanation in the paper. Did I miss something?What were the tendency of these hyper-parameters when training? How do these hyper-parameter selections are related to the imbalance level (or different datasets)? The found parameters work for other datasets in the paper (Tiny ImageNet, iNaturalist)?
Thanks.