niaid / dsb

Normalize CITEseq Data
Other
63 stars 13 forks source link

Integrated multiple samples #44

Closed Aaron-sqw closed 11 months ago

Aaron-sqw commented 11 months ago

Hi, Thanks for the excellent method for CITE-seq! I find the vignettes from "https://cran.r-project.org/web/packages/dsb/vignettes/end_to_end_workflow.html" only deal with one sample. If I have multiple samples (more than 3) CITE-seq data and hope to integrate together by anchors to correct the batch, how to deal with this circumstance, may I use the dsb normalized directly?

immune.anchors_adt <- FindIntegrationAnchors(object.list = immune.combined_adt.list, anchor.features = features) immune.combined_adt <- IntegrateData(anchorset = immune.anchors_adt, new.assay.name = "integrated.adt")

Thanks.

MattPM commented 11 months ago

Hi @Aaron-sqw If by samples you mean you have 3 batches, first you want to verify you have a large batch effect before doing any computational integration or correction. You can do this by calculating variance explained by batch. If you observe a large batch effect, then you could do that yes. However, it would not be appropriate if you have only a few proteins measured. Since methods like Seurat integration and Harmony compress the ADT data to find latent shared components in high dimensional space they are more geared toward mRNA where you have thousands of features (genes). For protein since there are less features, we have done a simple linear model batch correction (with limma) after dsb on a few projects with 80-100 proteins which also works well. If you used one of the larger protein panels currently available with hundreds of proteins you could try a less parsimonious correction like you're describing as long as the comparison groups of your experiment are not batch confounded.