satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.31k stars 920 forks source link

Function "SelectIntegrationFeatures" For Which Data Assay? #9418

Closed ForrestGump618 closed 2 weeks ago

ForrestGump618 commented 1 month ago

Dear professors, Hi! In the examples demonstrating how to prepare a normalized object list with SCTransform for integration (https://satijalab.org/seurat/reference/prepsctintegration), the provided workflow includes the following code:

if (FALSE) {
# to install the SeuratData package see https://github.com/satijalab/seurat-data
library(SeuratData)
data("panc8")

# panc8 is a merged Seurat object containing 8 separate pancreas datasets
# split the object by dataset and take the first 2 to integrate
pancreas.list <- SplitObject(panc8, split.by = "tech")[1:2]

# perform SCTransform normalization
pancreas.list <- lapply(X = pancreas.list, FUN = SCTransform)

# select integration features and prep step
**features <- SelectIntegrationFeatures(pancreas.list)**
pancreas.list <- PrepSCTIntegration(
  pancreas.list,
  anchor.features = features
)

# downstream integration steps
anchors <- FindIntegrationAnchors(
  pancreas.list,
  normalization.method = "SCT",
  anchor.features = features
)
pancreas.integrated <- IntegrateData(anchors, normalization.method = "SCT")
}

I noticed that the line features <- SelectIntegrationFeatures(pancreas.list) occurs before normalization. Does this imply that SelectIntegrationFeatures utilizes only the raw data, without any normalization? Thanks!

Qing-yuan Zhuang

rharao commented 3 weeks ago

SCTransform includes sequencing depth normalization; after applying SCTransform to each object, normalized data from the "data" layer of each object. SCTransform also finds a subset of variable features. The variable features for each object are used by SelectIntegrationFeatures.

longmanz commented 2 weeks ago

Hi, pancreas.list <- lapply(X = pancreas.list, FUN = SCTransform) already normalized all the objects with SCTransform(). For each gene in each object, SCTransform() generates a residual_variance which will be used to rank the genes for top "variable features".

For integration, you will need to run SelectIntegrationFeatures() to get a list of top overlapping "variable features" across these objects. These features/genes will be used for integration later. However, some of these genes have not yet been calculated for their Pearson Residuals, so you will need to run PrepSCTIntegration() to calculate the Pearson Residual for these left-out genes in each object.

For more details regarding SCTransform, you may refer to the publications (sctransform v1 and v2)

ForrestGump618 commented 2 weeks ago

Thanks for your patience!