y78h11b09 commented 4 years ago

Recently, I found that efficientDet-do didn't work for 1600 objects because cls_loss still didn't downgrade in training.

So , i modified the funtion **"torch.clamp(classification, min=1e-4, max=1.0 - 1e-4)" into "torch.clamp(classification, min=1e-8, max=1.0 - 1e-8)" in focol_loss,
and then efficientDet-d0 can be traind on for 1600 objects. i Who can tell me what advantage is the funtion of torch.clamp(） in focal_loss？ i think it should be removed completely!

rmcavoy commented 4 years ago

The clamp function probably improves stability in some cases but is very much unnecessary as you can switch to using the "with logits" version of the focal loss as is used in the tensorflow version of the official code (quote below from the official code's comments)

# Below are comments/derivations for computing modulator.
For brevity, let x = logits, z = targets, r = gamma, and p_t = sigmod(x)

for positive samples and 1 - sigmoid(x) for negative examples.

#

The modulator, defined as (1 - P_t)^r, is a critical part in focal loss

computation. For r > 0, it puts more weights on hard examples, and less

weights on easier ones. However if it is directly computed as (1 - P_t)^r,

its back-propagation is not stable when r < 1. The implementation here

resolves the issue.

#

For positive samples (labels being 1),

(1 - p_t)^r

= (1 - sigmoid(x))^r

= (1 - (1 / (1 + exp(-x))))^r

= (exp(-x) / (1 + exp(-x)))^r

= exp(log((exp(-x) / (1 + exp(-x)))^r))

= exp(r log(exp(-x)) - r log(1 + exp(-x)))

= exp(- r x - r log(1 + exp(-x)))

#

For negative samples (labels being 0),

(1 - p_t)^r

= (sigmoid(x))^r

= (1 / (1 + exp(-x)))^r

= exp(log((1 / (1 + exp(-x)))^r))

= exp(-r * log(1 + exp(-x)))

#

Therefore one unified form for positive (z = 1) and negative (z = 0)

samples is:

(1 - p_t)^r = exp(-r z x - r * log(1 + exp(-x))).

y78h11b09 commented 4 years ago

Good job, thank you very much , i will try it Have a nice day to you

y78h11b09@163.com

From: rmcavoy Date: 2020-03-26 02:37 To: toandaominh1997/EfficientDet.Pytorch CC: y78h11b09; Author Subject: Re: [toandaominh1997/EfficientDet.Pytorch] It does'nt work for dectecting 1600 objects for the funtion "torch.clamp(classification, min=1e-4, max=1.0 - 1e-4)" in focol_loss (#131) The clamp function probably improves stability in some cases but is very much unnecessary as you can switch to using the "with logits" version of the focal loss as is used in the tensorflow version of the official code (quote below from the official code's comments)

Below are comments/derivations for computing modulator.

For brevity, let x = logits, z = targets, r = gamma, and p_t = sigmod(x)

for positive samples and 1 - sigmoid(x) for negative examples.

#

The modulator, defined as (1 - P_t)^r, is a critical part in focal loss

computation. For r > 0, it puts more weights on hard examples, and less

weights on easier ones. However if it is directly computed as (1 - P_t)^r,

its back-propagation is not stable when r < 1. The implementation here

resolves the issue.

#

For positive samples (labels being 1),

(1 - p_t)^r

= (1 - sigmoid(x))^r

= (1 - (1 / (1 + exp(-x))))^r

= (exp(-x) / (1 + exp(-x)))^r

= exp(log((exp(-x) / (1 + exp(-x)))^r))

= exp(r log(exp(-x)) - r log(1 + exp(-x)))

= exp(- r x - r log(1 + exp(-x)))

#

For negative samples (labels being 0),

(1 - p_t)^r

= (sigmoid(x))^r

= (1 / (1 + exp(-x)))^r

= exp(log((1 / (1 + exp(-x)))^r))

= exp(-r * log(1 + exp(-x)))

#

Therefore one unified form for positive (z = 1) and negative (z = 0)

samples is:

(1 - p_t)^r = exp(-r z x - r * log(1 + exp(-x))).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

toandaominh1997 / EfficientDet.Pytorch

It does'nt work for dectecting 1600 objects for the funtion "torch.clamp(classification, min=1e-4, max=1.0 - 1e-4)" in focol_loss #131

For brevity, let x = logits, z = targets, r = gamma, and p_t = sigmod(x)

for positive samples and 1 - sigmoid(x) for negative examples.

The modulator, defined as (1 - P_t)^r, is a critical part in focal loss

computation. For r > 0, it puts more weights on hard examples, and less

weights on easier ones. However if it is directly computed as (1 - P_t)^r,

its back-propagation is not stable when r < 1. The implementation here

resolves the issue.

For positive samples (labels being 1),

(1 - p_t)^r

= (1 - sigmoid(x))^r

= (1 - (1 / (1 + exp(-x))))^r

= (exp(-x) / (1 + exp(-x)))^r

= exp(log((exp(-x) / (1 + exp(-x)))^r))

= exp(r log(exp(-x)) - r log(1 + exp(-x)))

= exp(- r x - r log(1 + exp(-x)))

For negative samples (labels being 0),

(1 - p_t)^r

= (sigmoid(x))^r

= (1 / (1 + exp(-x)))^r

= exp(log((1 / (1 + exp(-x)))^r))

= exp(-r * log(1 + exp(-x)))

Therefore one unified form for positive (z = 1) and negative (z = 0)

samples is:

(1 - p_t)^r = exp(-r z x - r * log(1 + exp(-x))).

Below are comments/derivations for computing modulator.

For brevity, let x = logits, z = targets, r = gamma, and p_t = sigmod(x)

for positive samples and 1 - sigmoid(x) for negative examples.

The modulator, defined as (1 - P_t)^r, is a critical part in focal loss

computation. For r > 0, it puts more weights on hard examples, and less

weights on easier ones. However if it is directly computed as (1 - P_t)^r,

its back-propagation is not stable when r < 1. The implementation here

resolves the issue.

For positive samples (labels being 1),

(1 - p_t)^r

= (1 - sigmoid(x))^r

= (1 - (1 / (1 + exp(-x))))^r

= (exp(-x) / (1 + exp(-x)))^r

= exp(log((exp(-x) / (1 + exp(-x)))^r))

= exp(r log(exp(-x)) - r log(1 + exp(-x)))

= exp(- r x - r log(1 + exp(-x)))

For negative samples (labels being 0),

(1 - p_t)^r

= (sigmoid(x))^r

= (1 / (1 + exp(-x)))^r

= exp(log((1 / (1 + exp(-x)))^r))

= exp(-r * log(1 + exp(-x)))

Therefore one unified form for positive (z = 1) and negative (z = 0)

samples is:

(1 - p_t)^r = exp(-r z x - r * log(1 + exp(-x))).