kkirchheim / pytorch-ood

👽 Out-of-Distribution Detection with PyTorch
https://pytorch-ood.readthedocs.io/
Apache License 2.0
255 stars 24 forks source link

Covariance estimation for Mahalanobis detector #76

Closed dkaizhang closed 1 month ago

dkaizhang commented 1 month ago

I believe the fit_features function for the Mahalanobis detector is currently not estimating the covariance correctly. It looks like the class-specific covariances are simply added up without normalisation resulting in a final covariance that's too large.

Happy to propose an alternative calculation if you agree with my observation!

Code excerpt of fit_features taken from here

    self.cov = torch.zeros(size=(z.shape[-1], z.shape[-1]), device=device)

    for clazz in range(n_classes):
        idxs = y.eq(clazz)
        assert idxs.sum() != 0
        zs = z[idxs]
        self.mu[clazz] = zs.mean(dim=0)
        self.cov += (zs - self.mu[clazz]).T.mm(zs - self.mu[clazz])

    self.cov += torch.eye(self.cov.shape[0], device=self.cov.device) * 1e-6
    self.precision = torch.linalg.inv(self.cov)
kkirchheim commented 1 month ago

Hello, you are right, the implementation is missing the normalization. However, scaling the covariance matrix by a constant factor also scales the mahalanobis distance by a constant factor, so this should not influence the performance.

The folowing derivation from ChatGPT seems reasonable to me:

Let's denote:

  • $$\boldsymbol{\Sigma}$$: Original covariance matrix.
  • $$c$$: Positive constant scaling factor.
  • $$\boldsymbol{\Sigma}' = c \boldsymbol{\Sigma} $$: Scaled covariance matrix.
  • $$\boldsymbol{\Sigma}'^{-1}$$: Inverse of the scaled covariance matrix.

The inverse of the scaled covariance matrix is:

$$ \boldsymbol{\Sigma}'^{-1} = (c \boldsymbol{\Sigma})^{-1} = \frac{1}{c} \boldsymbol{\Sigma}^{-1} $$

The Mahalanobis distance with the scaled covariance matrix becomes:

$$ \begin{align} D_M'(\mathbf{x}, \boldsymbol{\mu}) &= \sqrt{ (\mathbf{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}'^{-1} (\mathbf{x} - \boldsymbol{\mu}) } \ &= \sqrt{ (\mathbf{x} - \boldsymbol{\mu})^\top \left( \frac{1}{c} \boldsymbol{\Sigma}^{-1} \right) (\mathbf{x} - \boldsymbol{\mu}) } \ &= \sqrt{ \frac{1}{c} \left( (\mathbf{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu}) \right) } \ &= \frac{1}{\sqrt{c}} D_M(\mathbf{x}, \boldsymbol{\mu}) \end{align} $$

So, scaling the covariance matrix by $$c$$ scales the Mahalanobis distance by $$\frac{1}{\sqrt{c}}$$.

So, for correctness, we could apply the normalization, however, it should not affect results. Feel free to create a PR.

dkaizhang commented 1 month ago

Yes that's true. I will open a PR for scaling the distance but for posterity I agree that this shouldn't affect the ranking of the scores and hence results.