satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.31k stars 920 forks source link

Selecting the reduction type in FindMarkers #7673

Closed emidalla closed 1 year ago

emidalla commented 1 year ago

Hello,

I can't find an example showing how to specify the 'reduction' parameter using the 'FindMarkers' function. I have batch corrected my data using Harmony and would like to use its embeddings to find the markers.

This is the code I am actually using:

FindMarkers(data, ident.1 = "Resistant", ident.2 = "Healthy", group.by = 'Type', reduction= "harmony", only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)

I get an output but gene names are replaced by 'harmony_' values ('Resistant' and 'Healthy' are two values that data in the 'Type' column can have).

                   p_val   avg_diff     p_val_adj
harmony_9  7.489589e-179 2.20149386 1.157965e-174
harmony_2  1.541575e-112 1.83666510 2.383429e-108
harmony_1  1.705531e-103 3.88721766  2.636922e-99

Best, Emiliano.

igrabski commented 1 year ago

Hi Emiliano,

When specifying the reduction parameter in FindMarkers, you are actually just testing for differences in the cell embeddings between the healthy and resistant groups. So, your current function call is actually testing for differences in each Harmony dimension, not for differences in gene expression, which is why each row of your output corresponds to a dimension rather than a gene. Running Harmony here only produces batch-corrected embeddings, not batch-corrected expression values, so you wouldn't be able to directly test for batch-corrected gene expression differences.

emidalla commented 1 year ago

Hello and thanks for your kind reply!

What would be, then, the recommended way of including Harmony-corrected data into Seurat? To use data_harmony <- RunUMAP(data, reduction = "harmony", dims = 1:15) instead of the usual data_umap<- RunUMAP(data, dims = 1:15) and then proceed with markers discovery?

Thank you very much again.

Best, Emiliano.

igrabski commented 1 year ago

Hi Emiliano, the RunUMAP line will produce a UMAP representation based on the Harmony-corrected embeddings, but the UMAP representation doesn't affect FindMarkers. Running Harmony in Seurat does not produce batch-corrected expression values; it only produces batch-corrected embeddings, which can be used for visualization (e.g. through the UMAP representation) and clustering, but not for differential expression in the way you are asking. In general, however, we don't recommend running differential expression on corrected expression values anyway -- this is because after running integration procedures, the corrected values no longer follow the underlying assumptions of differential expression tests. You can, however, input confounders like batch labels into the latent.vars parameter for some of the differential expression tests, such as the logistic regression test (test.use = 'LR' in FindMarkers).

emidalla commented 1 year ago

Thank you very much again, as you suggest I will introduce confounders through the latent.vars parameter.

Best, Emiliano.