theislab / diffxpy

Differential expression analysis for single-cell RNA-seq data.
https://diffxpy.rtfd.io
BSD 3-Clause "New" or "Revised" License
193 stars 23 forks source link

Results changed from v0.6.13 to 0.7.1 #130

Open grst opened 5 years ago

grst commented 5 years ago

Hi,

while experimenting with diffxpy, I noticed that the results changed since I upgraded from v0.6.13 to v0.7.1. Is that intentional?

The setup:

I added diffxpy to the DE benchmark by Van den Berge 2019.

Under v0.6.13, diffxpy wald_test with nb noise-model produces results highly comparable to edgeR or a NB-model from python statsmodels:

True positive and false positive rate on simulated data at an FDR-cutoff of 0.05:

Method nDE TPR(%) FDR(%)
edgeR 78 7.1 9.0
diffxpy_wald 84 7.7 8.3
statsmodels_nb 85 7.7 9.4

However, under v0.7.1, the FDR is significantly inflated for diffxpy:

Method nDE TPR(%) FDR(%)
edgeR 78 7.1 9.0
diffxpy_wald 224 19.3 13.8
statsmodels_nb 85 7.7 9.4

Availability

The full analysis reports are available here:

The analysis is available at https://github.com/grst/benchmark-single-cell-de-analysis/. Everything is wrapped in a nextflow pipeline that uses conda envs. Simply running nextflow run ./benchmark.nf should reproduce the above reports.

davidsebfischer commented 5 years ago

Thanks @grst for the great problem description, I am looking into this. This is probably linked to us changing the default backend to numpy-based optimizers.

grst commented 5 years ago

Btw, the example data is also available here as tsv files, it's probably easier for you than running the entire pipeline: https://github.com/grst/benchmark-single-cell-de-analysis/tree/master/diffxpy_test

Also, I have the impression that 0.7.1 runs significantly slower than 0.6.13. Is that something you can confirm?