scikit-learn / scikit-learn

scikit-learn: machine learning in Python
https://scikit-learn.org
BSD 3-Clause "New" or "Revised" License
59.57k stars 25.3k forks source link

LDA: ellipses for confidence intervals: error in the doc? #7439

Closed JPFrancoia closed 7 years ago

JPFrancoia commented 8 years ago

Hi,

I'm not reporting a bug in the code, but rather something that might be an error in the doc. I also opened a question on SO.

I'll copy/paste it here and give additional details:

TL;DR

To plot confidence intervals after a LDA analysis: Should I use the covariance matrix shared by all classes (lda.covariance_), or should I calculate and use the covariance matrix of each class ?

Long explanation

Some time ago, I asked a question about how to draw ellipses around points: http://stackoverflow.com/questions/30682179/draw-ellipses-around-points

These ellipses will represent confidence intervals for Linear Discriminant Analysis (LDA) data points.

I will reuse my old picture, which I got from a scientific publication:

enter image description here

The red points (for example) could be defined as follow, after the LDA calculations:

[[-23.88315146  -3.26328266]  # first point
 [-25.94906669  -1.47440904]  # second point
 [-26.52423229  -4.84947907]]  # third point

You can see on the picture that the red points are surrounded by an ellipse, which represents the confidence interval (at a certain level) for the mean of the red points.

This is what I would like to obtain. Now scikit-learn's doc has an example about that (here):

def plot_ellipse(splot, mean, cov, color):
    v, w = linalg.eigh(cov)
    u = w[0] / linalg.norm(w[0])
    angle = np.arctan(u[1] / u[0])
    angle = 180 * angle / np.pi  # convert to degrees
    # filled Gaussian at 2 standard deviation
    ell = mpl.patches.Ellipse(mean, 2 * v[0] ** 0.5, 2 * v[1] ** 0.5,
                              180 + angle, color=color)

And this function is called like this:

plot_ellipse(splot, lda.means_[0], lda.covariance_, 'red')

In the doc's example, plot_ellipse is called to draw the confidence interval of all the classes, always with the same covariance: lda.covariance.

From scikit-learn's doc:

lda.covariance
Covariance matrix (shared by all classes)

lda.covariance is then used to determine the angle of the ellipses. As lda.covariance never changes, all the ellipses will have the same angle.

Is it mathematically correct to do that ? I am tempted to say no.

On another post (http://stackoverflow.com/questions/12301071/multidimensional-confidence-intervals), which is not related to LDA, Joe Kington simply uses (based on a published article) a " 2-sigma ellipse of the scatter of points". He calculates the covariance for each class:

cov = np.cov(points, rowvar=False)

, where points would be the 3 points described above, for example. He then uses a similar way to calculate the angle of the ellipses. But as he calculates the covariance matrix for each class, the angles of the ellipses are not the same across the classes.

amueller commented 8 years ago

Where does it say in the example that this is a confidence interval? LDA fits Gaussians per class that share a single covariance matrix.

What's the definition of confidence region for a classifier? That's not a concept I'm familiar with. A 95% confidence interval or region is a region that contains an unobserved parameter 95% of the time. What would be the unobserved parameter here? The true class mean?

JPFrancoia commented 8 years ago

Where does it say in the example that this is a confidence interval?

This is the title of the page: http://scikit-learn.org/stable/auto_examples/classification/plot_lda_qda.html: Linear and Quadratic Discriminant Analysis with confidence ellipsoid.

In the example, it is not mentioned what these "confidence ellipsoids" are. Maybe it's something I don't understand.

A 95% confidence interval or region is a region that contains an unobserved parameter 95% of the time. What would be the unobserved parameter here? The true class mean?

I would say so. But in that case, why use a unique covariance matrix to plot all the ellipsoids ? Wouldn't it be better to calculate a covariance matrix for each cloud of points (each class) ?

That's what they do here: https://github.com/joferkington/oost_paper_code/blob/master/error_ellipse.py

They calculate the covariance matrix for each class and use it to determine the angle of the ellipsoid. Then, they plot the mean of the class and its ellipsoid, which has a width of 2 * nbr_std * std_x, and a height of 2 * nbr_std * std_y. For a 95% confidence interval, nbr_std would be 1.96.

I think it's a better approach, but I don't understand the example of the doc. And I think there are some errors in the example. For example:

ell = mpl.patches.Ellipse(mean, 2 * v[0] ** 0.5, 2 * v[1] ** 0.5, 180 + angle, color=color)

Here, plotting the ellipse with an angle of 180 + angle is the same thing as plotting the ellipse with just angle, because rotating an ellipse by 180° will result in the same ellipse.

amueller commented 8 years ago

Wouldn't it be better to calculate a covariance matrix for each cloud of points (each class) ?

There is not really a better or worse, there's only a correct and an incorrect ;) Well there are multiple 95% confidence regions, but there is one that is more natural then the rest. And I think that should be derived from the model in this case. I'm still confused by the definition though.

Hm this example was done by @fabianp. Maybe @agramfort or @GaelVaroquaux know about it? I find the "confidence ellipsoid" a bit strange.

ixjlyons commented 8 years ago

I would say so. But in that case, why use a unique covariance matrix to plot all the ellipsoids ? Wouldn't it be better to calculate a covariance matrix for each cloud of points (each class) ?

Because the underlying model of LDA assumes the classes all have equal covariance -- I think that's the main point of this example. (Note that there was a recently-fixed bug in the example where the ellipses were hidden under the data. You can see them in the dev version of the docs). QDA has a more general model that allows for different covariance matrices for each class (hence the different ellipse shapes in the bottom right plot), and LDA is a special case where you assume all classes have equal covariance. If you calculate covariances for each class independently and plot those ellipses, you're not representing the underlying model of LDA.

Here, plotting the ellipse with an angle of 180 + angle is the same thing as plotting the ellipse with just angle, because rotating an ellipse by 180° will result in the same ellipse.

I'm not sure about this...it does seem odd. It's been that way since the example was first introduced.

JPFrancoia commented 8 years ago

Ok, so could you clarify what do the ellipsoids of the example represent ? What do they "physically" mean ?

Also, would it be wrong to represent the confidence intervals as I mentioned ? (In this case, they would be the confidence intervals for the mean of the classes).

EDIT:

Look at this paper, in chemistry: http://pubs.rsc.org/en/Content/ArticleLanding/2016/AN/C5AN02545A#!divAbstract

(You can download it here: http://moscow.sci-hub.cc/8fbf84cc53ec2a1ca2cc424c8fb3f766/chang2016.pdf)

The "confidence intervals" don't all have the same shape. The captions mention that "The ellipses indicate a 95% confidence level"

amueller commented 7 years ago

They mean the 95% mass of the per-class gaussian. I think we should change the title of the example and explain what the ellipses mean. PR welcome.

JPFrancoia commented 7 years ago

I can change the title and the text, but I don't know what "95% mass of the per-class gaussian" means, so I can't really explain what the ellipses mean. If someone can quickly answer here, I'll make the PR.

amueller commented 7 years ago

Oh actually, it might be only one standard deviation of the Gaussian, sorry. But I'm not entirely sure what the best way to phrase it is. Either say it's visualizing the covariance or the standard deviation?

JPFrancoia commented 7 years ago

If I'm not wrong, they are plotting the standard deviation, because:

    ell = mpl.patches.Ellipse(mean, 2 * v[0] ** 0.5, 2 * v[1] ** 0.5,

and

    v, w = linalg.eigh(cov)

They are taking the square root here, so I think it's the std.

Oh actually, it might be only one standard deviation of the Gaussian, sorry

And yes, there is only one std of the gaussian (one on each side of the mean however).

dalmia commented 7 years ago

I feel we don't need to explicitly change the title of the example. Maybe explaining the meaning of the confidence ellipsoids should work.

amueller commented 7 years ago

I don't think it's a confidence ellipsoid in any meaningful way.

JPFrancoia commented 7 years ago

True. I think a confidence interval should represent the confidence you put in a value, an estimator. That's why when chemists use LDA in their papers, they use a confidence interval for the mean of each class. But while it seems more understandable to me, I'm not an expert in statistics.

dalmia commented 7 years ago

Using the conclusions from the above discussions and with the help of these links : 1 & 2, I feel the title should be : 'Linear and Quadratic Discriminant Analysis with covariance ellipsoid' and that the ellipsoids display the double standard deviation for each class. I'll make a PR with the following changes and we can carry forward the discussion there.

JPFrancoia commented 7 years ago

the ellipsoids display the double standard deviation for each class

Just to be sure we will use the right terms: if I understood well, this is not accurate. With LDA at least, all the classes share the same covariance matrix. The standard deviation is the same for all the classes, therefore all the confidence ellipsoids are the same for LDA (look at the example: same size, shape, angle).

However, QDA allows each class to have its own covariance matrix, therefore each class has its own confidence interval (each ellipsoid has a particular shape, size, angle).

So we should rather say:

the ellipsoids display the double standard deviation for each class. With LDA, the standard deviation is the same for all the classes. With QDA, each class has its own standard deviation.

The doc should clearly mention that.

dalmia commented 7 years ago

@JPFrancoia Yes, I agree with you that we should be more precise distinguishing between the values for LDA & QDA. I'll make that change.