theislab / single-cell-tutorial

Single cell current best practices tutorial case study for the paper:Luecken and Theis, "Current best practices in single-cell RNA-seq analysis: a tutorial"
1.39k stars 458 forks source link

executing sc.tl.rank_genes_groups() function on tutorial data provides incorrect results #102

Closed Monkeyninjaw closed 2 years ago

Monkeyninjaw commented 2 years ago

Hello,

I'm running the tutorial in the docker container environment and I'm getting incorrect results when I run the sc.tl.rank_genes_groups() function -- my results don't match with those reported in the original tutorial .ipynb file. As a consequence, downstream functions such as sc.tl.marker_gene_overlap() and sb.heatmap() also return incorrect results.

For example, the heatmap is supposed to look something like as shown below: image

However, when I run the same tutorial pipeline with the same data, I get a different, erroneous result. image

When comparing the ranking of marker genes, I also get different outputs. The expected result for cluster 0 is shown below, image

But I get very different marker genes for cluster 0: image

Up to this point (the execution of the sc.tl.rank_genes_groups function), my output (mostly) matches the expected output, so I think the problem must be with the sc.tl.rank_genes_groups() function, as opposed to upstream code. Here is the only discrepancy that I can see:

The text output of the sc.tl.rank_genes_groups() command in the original tutorial file only says "ranking genes, finished (0:00:08):

image

When I try to replicate this, however, I get much more verbatim:

image

Regarding the "method" argument, I tried both method = 't-test' and method = 't-test_overestim_var' but the result doesn't seem to change.

Does anyone know what I'm be doing wrong here? Just for clarification, I'm just trying to reproduce the output as shown in the original tutorial file. I know that it was mentioned in previous issues that even within the same docker container, results are not always perfectly reproducible from system to system, but as shown above my results are not even close to what is expected.

Any help would be appreciated!

Monkeyninjaw commented 2 years ago

Turns out that the problem was with the sc.tl.marker_gene_overlap() function, not the sc.tl.rank_genes_groups() function. Adding the top_n_markers argument fixed the problem: https://github.com/scverse/scanpy/issues/1411

Thank you!