can't replicate 170505_seurat/seurat.ipynb notebook

fidelram commented 6 years ago

I tried to replicate the .. notebook using scanpy but I got some different results (see notebook here):

The tsne plot looks different (although similar groups can be seen).
The clustering is different, particularly the number of clusters is smaller
More concerning, the results of sc.tl.rank_genes_groups(adata, 'louvain', method='logreg') seem quite different compared to the results from the default method (which are similar to the original notebook for some groups). For example, for louvain cluster '0', the top ranking genes in the original notebook are LDHB and CD3D. I see these two genes using the default ranking method. However, for the 'logreg' method the list of top genes is quite different.

Would be possible for you to re-run the notebook to see if you get the same results that I get? Maybe the data that you are using is different than the one I use (I downloaded the pbmc3k data from 10x)?

falexwolf commented 6 years ago

Hi Fidel,

Slight changes in the tSNE are likely due to different versions of the MulticoreTSNE package. The UMAPs, however, should be the same, as all of it is in Scanpy. The reason, why here, it looks different, are probably due to slight differences in the PCA when computing the graph. Either from using scikitlearn 0.19.2 or updated pandas and numpy, which might affect floating point precision results.
This equally affects the clustering result. As Louvain clustering is a greedy algorithm, the changes can be quite dramatic, which is what is seen.
The rank_genes_groups results certainly reflect your new clusters, which simply are composed of different cells than in the reference. The function itself is tested and the same since a long time.

I'm rerunning this notebook all the time, of course. And the results are consistent with the ones that are uploaded and run with Scanpy 1.1. Since you came up with the idea of comparing figures, I wanted to write a test for each notebook based on the images. But I still haven't automatized it.

Does it make sense?

fidelram commented 6 years ago

Thanks for the reply.

I like the idea of having tests based on notebooks! There are many functions not currently tested and that would easily add may more tests.

However, for the notebook that I tried to replicate the test would had failed for reasons that are not straightforward to identify. But beyond that, certainly most of the images will fail automatic tests. For the automatic plotting tests of scanpy I had to save the images without any layout enhancement, otherwise the tests fail. Thus, the resulting test images have labels that are cut, making them only useful for tests but not for anything else.

My suggestion would be to add, as part of any PR an automatic message that asks the developer to run the notebooks and manually check that they are ok before submitting the PR.

fidelram commented 6 years ago

About the rank_genes_groups:

I have noticed that the logreg method usually does not produce good results (tested with multiple samples). The identification of marker genes is something that I have been investigating and for which I have developed some visualization tools to aid me on the validation of the markers. For all other methods, the visualizations show that the top ranked genes are useful to discern the clusters. But, for the logreg method the results do not make any sense when visualized.

In contrast, in the original notebook the logreg method produces results that are comparable to the other methods. However, in my hands, using the same notebook, I get different results, that's why I would like to know if this also happens to you using the master branch to investigate if this is a problem of my installation.

falexwolf commented 6 years ago

OK, let me check this again. Indeed, there was a pull request for the logreg method a week ago and I just noticed that we only covered t-test and wilcoxon-rank with tests, as for the logreg method, we simply use scikit-learn. I'll add a test...

I have noticed that the logreg method usually does not produce good results (tested with multiple samples).

But maybe this is something more general: it depends on what your notion of "good" is. I guess you mean: which gene taken as a single predictor gives me the best discriminative (predictive) power for identifying a cluster. Then, the logreg method will fail completely as it's a multivariate method. If you ask for sets of genes that together give you the best predictive power in a linear model, then logreg provides the answer. There are cases where this is meaningful.

falexwolf commented 6 years ago

Everything is fine with the current state of master of Scanpy. It perfectly recovers the standard clustering tutorial. Let me know create some tests for it.

falexwolf commented 6 years ago

Way too late, the test for the notebook: https://github.com/theislab/scanpy/tree/master/scanpy/tests/notebooks

Your non-reproducibility issues are all due to the PCA. Calling the PCA with solver='arpack' solves it. I'll update the notebook and output a warning.

scverse / scanpy_usage

can't replicate 170505_seurat/seurat.ipynb notebook #6