Closed tkinsella333 closed 4 years ago
Basically, LDA is projecting your X values (an n x p matrix, so for the lab this is n x 256) to a lower dimensional space (in this case n x 4 since there are 5 classes for y). The first two columns of this new lower dimension matrix (the n x 4 matrix created from LDA) are LD1 and LD2. The big picture from a data visualization perspective is that you cannot plot something in 256 dimensions (you can't even plot something in 4 dimensions!) but you can plot something that has just 2 dimensions - LD1 and LD2 are a reduced version of the large matrix of predictors (and they happen to be the best two-dimensional plane for visualizing the discriminant rule calculated from the model). I don't expect you to be able to calculate these "by hand" - we have only covered calculating the discriminant score, the probability of being in each class, and the discriminant rule in the simple 2 class 1 predictor case.
Visualizing data like this can help identify broadly how well the model will discriminate between groups (like if one group was very clearly clustered together, LDA is likely going to have no problem identifying those points).
If you're interested in a little more explanation on the math-side, I really like these slides
Regarding the graph of LD1 vs LD2 in your slides, I'm not quite sure exactly how to interpret LD1 and LD2 for a given LDA model. It seems from the data in your slides they provide numeric data on where to group different observations but beyond that I'm not sure where they are coming from, or why they are able to classify the groups into these nice centroids. I didn't see a mention of them in the book, but perhaps I missed that.
My question is: What is our big takeaway with these values? The graph allows us to view groupings for different observations, but of course the hat matrix would do that as well. So I'm not sure what exactly a large LD1 and small LD2 value really tells me.