satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.3k stars 918 forks source link

Difference between selectintegration features and findintegrationanchors #3661

Closed Coolchong closed 4 years ago

Coolchong commented 4 years ago

Dear Seurat Team,

Congrats on your recent release of v4!

I have questions about what exactly SelectIntegrationFeatures and FindIntegrationAnchors are doing, and how features from SelectIntegrationFeatures are used in FindIntegrationAnchors and matching the steps described in 2018 nature biotechnology.

From what I have read, SelectIntegration Features "ranks features by the number of datasets they appear in, breaking ties by the median rank across datasets. It returns the highest features by this ranking." So this is different from the variable features found from SCTransform for each dataset. Or could I understand it as the most shared variable features across multiple dataset in the list?

Then how could these features be used in the FindIntegrationAnchors function? If I understand correctly, FindIntegrationAnchors function is to calculated canonical correlation vectors between each two datasets that maximize the correlation, not sure how the shared features are helpful in the calculations.

Also, in your 2018 nature biotechnology paper you described steps as "gene selection for canonical correlation vector alignment" and "alignments of these vectors into a common aligned space".Is this gene selection same as FindIntegrationAnchors, and alignments of the vectors include in the IntegrateData function?

Thank you!

andrewwbutler commented 4 years ago

Hi,

Just to clarify a bit, the workflow you are describing was published and described in the 2019 Cell paper here, not the 2018 NBT paper.

  1. Yes, the list from SelectIntegrationFeatures will likely be different from the variable feature list from SCTransform run on each dataset individually (this is the first step usually) and you can think of it as the top N shared variable features.

  2. These are the features used by default in FindIntegrationAnchors, if none are provided manually (i.e. FindIntegrationAnchors will run SelectIntegrationFeatures). The CCA is run on matrices subsetted to contain the features specified.