satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.3k stars 917 forks source link

Confusions about RunCCA result #535

Closed ysbioinfo closed 6 years ago

ysbioinfo commented 6 years ago

HI, So sorry for asking such a simple question, I am not good at statistics and confused about the RunCCA process. I read your NBT paper (2018), it seems that you do CCA on two matrices: X and Y, which have the same number of rows (genes): n, but different number of columns (cells): m and p. And then CCA returns vector u and v, by which you can define the metagene for downstream alignment. Based on my understanding of CCA, the vector it returns should have different length, i.e. the length of u and v should be m and p respectively, which means a linear combination of each cell in the two groups. However, in the result of RunCCA (object@dr$cca), I see not only a weight for each cell, but also a weight for each gene. So I am confused, why there is also a linear combination for each gene? Where do these weights come from? Thank you very much!

andrewwbutler commented 6 years ago

We define the gene loadings for CCA by multiplying the scaled expression matrix by the cell embeddings. In the paper, this is described in the first equation under the "Identification of rare non-overlapping subpopulations" subsection in the Online Methods (A = Xu).