Implementation details of the cost function

mbisan commented 1 year ago

Hi, I've been trying to use this model (DLV3+) with a different dataset, showing only images without pedestrians, then trying to detect them with the anomaly detector. Due to very poor results I reviewed the training loop to understand where it might be failing.

in the file semseg_negatives_joint_th.py the implementation of the cost function after producing the model outputs is the following:

cls_out = F.log_softmax(logits, dim=1)
ood_out = F.log_softmax(logits_ood, dim=1)

lse = torch.logsumexp(logits, 1) * label_ood
if self.use_ma:
    reg = self.get_batch_avg(logits, label_ood) # useless
else:
    reg = - logits.mean(1) * label_ood
loss_ood = (lse + reg.detach()).sum() / label_ood[label_ood==1].numel()
loss_seg = F.nll_loss(cls_out, label, ignore_index=self.args.num_classes)
loss_th = F.nll_loss(ood_out, label_ood)

loss = loss_seg + self.args.beta * loss_ood + self.args.beta * 10 * loss_th

I'm trying to link the different parts of the loss to the different parts of the formula above (Formula 15 in the paper). For inlier pixels, loss_seg corresponds to $P(y|x)$, loss_th includes the cost for inliers $P(d{in}|x)$ and the cost of outliers $P(d{out}|x)$, which is scaled by 10*beta. In the formula above the cost corresponding to $P(d_{in}|x)$ is not scaled by beta, while in the implementation it is. Why is that?

For loss_ood, which corresponds to $\hat{p}(x)$, I suppose the value reg is simply being added for numerical stability reasons?

I'm also confused as to why there is a "useless" comment on the line.

matejgrcic commented 1 year ago

reg is there only for stability reasons. The comment "useless" is there to note that the gradient does not pass through reg due to .detach() in line below.

mbisan commented 1 year ago

What about the scaling factor and not separating the costs of $P(d_{in}|x)$?

matejgrcic commented 1 year ago

Indeed, eq. (15) is missing \beta near \ln P(din|x) and the scaling factor in \ln p(x). I'll fix that in the next revision of the manuscript. Thanks for pointing that out. However, note that these are just implementation details, so you can still get the intuition from eq. (15).

mbisan commented 1 year ago

Ok, as I understand the formula implemented is the following:

$$L(\theta, \gamma) = \mathbb{E}{x, y\in D{in}}[ \ln(P(y|x) + \beta \ln P(d{in}|x))] + \beta \mathbb{E}{x\in D{out}}[ \ln(P(d{out}|x) + \delta \ln \hat{p} (x))] $$

Where $\delta$ is a scaling factor.

Thank you!

matejgrcic / DenseHybrid

Implementation details of the cost function #6