satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
210 stars 33 forks source link

SCTransform #41

Closed MQMQ2018 closed 4 years ago

MQMQ2018 commented 5 years ago

Dear Christoph,

Regarding to SCTransform function, should we still need to do the batch effect regression if our data needs after SCTransform, or SCTransform should supposed cover the removal of batch effect already when we apply SCTransfrom on mixed libraries at the very beginning. It yes, does the batch effect corrected matrix will be automatically saved in default array for following CCA analysis(e.g. Deferentially Expressed gene detection based on each clusters)? Or we need to do "DefaultAssay(immune.combined) <- "SCT""? Thank you so much!

Best, Qi

MQMQ2018 commented 5 years ago

And should we do the SCTransform sample by sample? Or once based on mixed samples? Based on guidance on Seurat-reference based Integration as below, it looks like to do it once based on mixed samples?

====

data("pbmcsca") pbmc.list <- SplitObject(pbmcsca, split.by = "Method") for (i in names(pbmc.list)) { pbmc.list[[i]] <- SCTransform(pbmc.list[[i]], verbose = FALSE) }

====

Thank you so much. Best, Qi

JackieShen68 commented 5 years ago

Hi, Christoph,

One related question here: if we perform SCTransform on each sample from a list of libraries as demonstrated by Seurat 3.1 tutorial, how could we correct technical variation and/or batch effects among all the samples together? Also, could you please explain in detail, how could composition of cell types of a Seurat objective affect the result of SCTransform?

To make the 2nd question more clear, here let's assume we have two library samples (to make it easy) for biological replicates which differ in cell composition due to technical variation and original sample variation. For instance, for library-1, here is the composition of cells: A celltype: 5%; B celltype: 10%; C celltype: 50%; D celltype: 20%; E celltype: 15%; For library-2, the composition of cells is changed to: A-10%; B-20%, C-40%; D-25%; E-5%. Now, let's assume the median gene number per cell also differs (caused by sequencing depth or technical variation) in these two biological replicates. In this situation, how could we perform SCTransform properly? Could you please give me some advice or suggestion? Thank you very much! I am waiting for your reply:) Best, Jackie:)

MQMQ2018 commented 5 years ago

Asked the same question as above. Thank you so much.

ChristophH commented 5 years ago

Hi,

@MQMQ2018 If you have merged samples and observe a batch effect you can either 1) use the batch_var parameter passed to sctransform::vst via Seurat::SCTransform, or 2) run an integration as outlined here

I'd recommend option 1 only if your samples have roughly same celltype compositions and the batch effects are characterized by simple shifts in mean expression.

@JackieShen68 Which Seurat tutorial do you refer to? Regarding your second question, sctransform does a relative normalization, with the result that a celltype specific marker will appear lower in a sample with high abundance of that celltype compared to a sample with low abundance. In your example, the celltype compositions between samples are not drastic, but in more extreme cases this could be a problem. For that, an integration analysis that is based on co-variation of genes is more suited (see option 2 above).

For integration analyses with Seurat::SCTransform please consult the Seurat vignettes and the Seurat github issue page

MQMQ2018 commented 5 years ago

Dear Christoph, Thank you for your reply. It is very helpful.

Best, Qi

On Mon, Sep 30, 2019 at 1:36 AM ChristophH notifications@github.com wrote:

Hi,

@MQMQ2018 https://github.com/MQMQ2018 If you have merged samples and observe a batch effect you can either

  1. use the batch_var parameter passed to sctransform::vst via Seurat::SCTransform, or
  2. run an integration as outlined here https://satijalab.org/seurat/v3.1/integration.html

I'd recommend option 1 only if your samples have roughly same celltype compositions and the batch effects are characterized by simple shifts in mean expression.

@JackieShen68 https://github.com/JackieShen68 Which Seurat tutorial do you refer to? Regarding your second question, sctransform does a relative normalization, with the result that a celltype specific marker will appear lower in a sample with high abundance of that celltype compared to a sample with low abundance. In your example, the celltype compositions between samples are not drastic, but in more extreme cases this could be a problem. For that, an integration analysis that is based on co-variation of genes is more suited (see option 2 above).

For integration analyses with Seurat::SCTransform please consult the Seurat vignettes and the Seurat github issue page https://github.com/satijalab/seurat/issues?utf8=%E2%9C%93&q=is%3Aissue+sctransform+integration

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ChristophH/sctransform/issues/41?email_source=notifications&email_token=AKIA5BFULZXBFA7GRAKPW5DQMG3AFA5CNFSM4ISYEKSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD744CJA#issuecomment-536461604, or mute the thread https://github.com/notifications/unsubscribe-auth/AKIA5BBUIKMDZFDMNOHPP73QMG3AFANCNFSM4ISYEKSA .

-- Qi Ma, Postdoctoral Fellow, Bioinformatics, Department of Medicine University of California, San Diego (UCSD) 9500 Gilman Dr. MC0648 La Jolla, CA 92093-0648 (858)5983866 / (858)5346392 maqiwinner@gmail.com q1ma@ucsd.edu

JackieShen68 commented 5 years ago

Dear Christoph,

Thanks for your reply! The tutorial I mentioned is described in https://satijalab.org/seurat/v3.1/integration.html, which performed SCTransform on single library and then integrated all the libraries through CCA-based method. Based on my understanding and your explanation, we should perform SCTransforom on single libraries, if there is large variations of gene expression among libraries due to technical effects or different species. If the composition of cell types are very similar in different libraries, which are performed by identical technical procedure, we could simply merge all the libraries and perform SCTransform on the merged objective. Am I right?

It would be greatly appreciated if you could explain more when should we perform SCTransform on single library and when should we perform SCTransform on merged library. What's the potential differences between these two approaches? Any golden standard for us to consider when we apply SCTransform for normalization? Thank you very much! I am waiting for your replay:)

luluyadummy commented 5 years ago

Hi,

@MQMQ2018 If you have merged samples and observe a batch effect you can either

  1. use the batch_var parameter passed to sctransform::vst via Seurat::SCTransform, or
  2. run an integration as outlined here

I'd recommend option 1 only if your samples have roughly same celltype compositions and the batch effects are characterized by simple shifts in mean expression.

@JackieShen68 Which Seurat tutorial do you refer to? Regarding your second question, sctransform does a relative normalization, with the result that a celltype specific marker will appear lower in a sample with high abundance of that celltype compared to a sample with low abundance. In your example, the celltype compositions between samples are not drastic, but in more extreme cases this could be a problem. For that, an integration analysis that is based on co-variation of genes is more suited (see option 2 above).

For integration analyses with Seurat::SCTransform please consult the Seurat vignettes and the Seurat github issue page

Hi @ChristophH,

I wonder if we only want to look at a subset of all cells, say we only want T-cells, should we normalize only on T-cells for downstream analysis? Since you said that the sctransform does a relative normalization and therefore expression in other cell types will have an impact on the results. T-cell specific markers that have high expressions will have a lower normalized value if we take everything into account than if we normalize with respect to T-cells only, am I understanding it right?

ChristophH commented 4 years ago

@luluyadummy Sorry for the late reply, but yes, that is correct. The output of the normalization is a relative measure of expression. Hence its interpretation depends on the set of cells that went into the normalization together.