satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.28k stars 910 forks source link

How to evaluate differential HTO expression within one individual cluster, not in comparison to all other clusters #5379

Closed hannaminns closed 2 years ago

hannaminns commented 2 years ago

Hello!

I am trying to evaluate differential HTO expression within one individual cluster, not in comparison to all other clusters.

For our single cell experiment, we have four treatment groups that were tagged with four different HTO antibodies, #1, #2, #3, and #4. For the analysis, I therefore have two assays, an RNA assay and an HTO assay.

I had no trouble identifying clusters and using FindAllMarkers to get differential gene expression for each cluster.

However, I also want to figure out the differential HTO expression within a cluster to evaluate whether a cluster is "upregulated" or over-represented by cells from a certain hashtag (treatment group). From density plots, I can see differential representation of each hashtag/treatment group across clusters.

I ran FindAllMarkers() on the HTO assay, but that gave me differential HTO expression of the cells in x cluster compared to all other clusters, if I'm understanding the algorithm right, so my output looks like this, for cluster 1 for example: Capture

I can see from the density plots that cluster 1 has more cells from hashtag antibody 3 than from the other three groups, so the fact that hashtag #3 has a less negative fold change than the other three makes sense, but I hope there's a better way to statistically analyze this that analyzes only the cells within a specific cluster, not in comparison to all other clusters, so that if this was true, #3 would have a positive fold change compared to the others.

In sum, across only the cells in cluster 1, I want to statistically evaluate if there are more tagged with hashtag/treatment group #3 than groups #1, #2, and #4.

Is there a way to do that?

Thank you!

samuel-marsh commented 2 years ago

Hi,

Not member of dev team but hopefully can be helpful.

So if I understand correctly you are looking for differential abundance between groups within cluster? Is that right?

Also is there reason you are doing HTO expression vs simply assigning a cell identity based on which HTO a cell is tagged with?

Best, Sam

hannaminns commented 2 years ago

Hi Sam,

Yes, that's correct.

I have assigned a cell identity based on the HTO tag similarly to how I created new cell type identities for the clusters. So yes, I have two additional identities: "tx group" and "cell type" that I could call as well.

But I want the DE testing to assess differential abundance of cells from each tx group within a cluster, as you stated, not differential gene expression between each tx group within a cluster (obviously is also an important part of our analysis but I know how to do that part).

Let me know if that makes sense. Thanks for your help!

Best, Hanna

samuel-marsh commented 2 years ago

Hi Hanna,

Yes so there are a number of different approaches that have been developed for this that can be used with Seurat or data extracted from Seurat Objects but Seurat specifically doesn't have functions to do them. There isn't necessary a right way to do differential abundance and this is area with lots of new recent developments. Couple examples of methods/packages to do this kind of analysis are approaches like MASC (associated R package with same name; https://pubmed.ncbi.nlm.nih.gov/30333237/), Speckle (https://github.com/Oshlack/speckle) and some very new ones like Milo and and CNA (https://www.nature.com/articles/s41587-021-01033-z, https://www.nature.com/articles/s41587-021-01066-4).

Personally, I've used both MASC and Speckle before but as I said there are lots more approaches than just the ones I mentioned above. Each with their own take/method on how best to model and assess the data. It's up to end user to decide which model they think best (or best for now) and apply that.

Best of luck! Sam

torkencz commented 2 years ago

Hi Hanna. I've also used MASC before on this sort of problem, but like Sam said it is really up to you which is the best model for your data.