satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.27k stars 910 forks source link

Determining Necessity of CCA #1153

Closed lololiam closed 5 years ago

lololiam commented 5 years ago

Hi,

My issue is very similar to this one.

I am looking at rerunning some data from a study using Seurat. This Study peformed single cell RNA sequencing on 10 samples, including 2 conditions (6 healthy, 4 with type 2 diabetes).

Regarding the following answer in the aforementioned issue:

"We recommend first performing the analysis under the standard workflow. If the subtypes cluster together across batches, there is no need to run CCA or any integration workflow."

What exactly is meant by this? How exactly should I go about determining if any batch effects are great enough to need to run CCA?

Ensuring that after using FindClusters(), clusters contain cells from all samples and there are no clusters consisting purely of cells from one sample?

Currently I have used PlotPCA() and coloured by condition to observe that the healthy and type 2 diabetes cells align quite well, but when doing the same and colouring by sample, it's a bit hard to see anything useful with 10 different colours.

Any advice would be greatly appreciated, Liam

ysu2015 commented 5 years ago

Hi, I have the similar questions. And also, after running MetageneBicorPlot, how to determine ccs to use for downstream ananlysis, especially one groups are far from the other two. I attached my plot in below.

Yijing

satijalab commented 5 years ago

Thanks for your question. I think its important to remember that the goal for the integration procedure is to make sure that cells that are in the same underlying biological state are grouped together. This would enable you to compare how cell states (for example, T cells), change across healthy and diabetic samples.

We would recommend running integration in this case. However, you ask if you need to perform this procedure. One way of carefully addressing this would be to analyze, cluster, and annotate each dataset individually. You can compare these cluster annotations to an analysis where all the data is included in a single clustering (without integration). If the results are consistent, you know that your batch effects are minimal, and you do not need to perform integration.

pagarwal14 commented 4 years ago

Hi, I had a follow up question just to make sure I understand correctly. When you say " You can compare these cluster annotations to an analysis where all the data is included in a single clustering (without integration)", does integration here mean CCA? Does single clustering mean taking the cells from Experiment/Treatment A and from Experiment/Treatment B and combing in one expression matrix and then following the standards workflow (without CCA)? Thanks.