About Alpha parameter in focal loss

umbertogriffo / focal-loss-keras

Binary and Categorical Focal loss implementation in Keras.

278 stars 67 forks source link

About Alpha parameter in focal loss #1

Closed lumliolum closed 5 years ago

lumliolum commented 5 years ago

From the paper, alpha's are weights for each example. So why alpha=0.25 is kept? Does this mean giving equal weight to all the examples?

I may be wrong but this what I understood from the paper.

umbertogriffo commented 5 years ago

No, It doesn't give equal weight to all the examples.

The focusing parameter γ(gamma) smoothly adjusts the rate at which easy examples are down-weighted. When γ = 0, focal loss is equivalent to categorical cross-entropy, and as γ is increased the effect of the modulating factor is likewise increased (γ = 2 works best in experiments).

α(alpha): balances focal loss, yields slightly improved accuracy over the non-α-balanced form.

I suggest you to read the paper much better ;-)

lumliolum commented 5 years ago

In the paper, the balanced form has alpha_t(1-pt)^(gamma)(log(pt)).

I am saying that in the equation it is alpha_t not alpha meaning the alpha_t is different for each example and not a constant. In the above section(balanced cross entropy) there also alpha_t was different for each example. I think they are saying that when we use weighted focal loss we get slightly better accuracy.

Note: Another thing I want to mention is alpha = 1 and alpha = 0.25 doesn't make any difference because you are just scaling the loss function and the optimal weights of the model will be the same for both the cases then how can it give better accuracy ?

umbertogriffo commented 5 years ago

For example, in Binary case alpha is the weighting factor, 1 for class 1 and 1-alpha for class 0, so alpha balances the importance of positive/negative examples. So you have to choose only an alpha value.

lumliolum commented 5 years ago

I saw the code and thought that you are multiplying with alpha with whole equation but you are multiplying alpha and 1-alpha.

My bad !!

thanks for the reply.

michaeloc commented 4 years ago

Can I define multiple alphas on the multi-class problem?

lix4 commented 3 years ago

In focal paper, it says In practice α may be set by inverse class frequency or treated as a hyperparameter to set by cross validation. So for each class, I guess you compute occurrence in training set and do the inverse.

JishanAhmed2019 commented 2 years ago

How to set α by inverse class frequency? Is it like class_weights = dict(zip(np.unique(y_train), class_weight.compute_class_weight('balanced', np.unique(y_train), y_train))) ?