Results changed from v0.6.13 to 0.7.1

grst commented 5 years ago

Hi,

while experimenting with diffxpy, I noticed that the results changed since I upgraded from v0.6.13 to v0.7.1. Is that intentional?

The setup:

I added diffxpy to the DE benchmark by Van den Berge 2019.

Under v0.6.13, diffxpy wald_test with nb noise-model produces results highly comparable to edgeR or a NB-model from python statsmodels:

True positive and false positive rate on simulated data at an FDR-cutoff of 0.05:

Method	nDE	TPR(%)	FDR(%)
edgeR	78	7.1	9.0
diffxpy_wald	84	7.7	8.3
statsmodels_nb	85	7.7	9.4

However, under v0.7.1, the FDR is significantly inflated for diffxpy:

Method	nDE	TPR(%)	FDR(%)
edgeR	78	7.1	9.0
diffxpy_wald	224	19.3	13.8
statsmodels_nb	85	7.7	9.4

Availability

The full analysis reports are available here:

The analysis is available at https://github.com/grst/benchmark-single-cell-de-analysis/. Everything is wrapped in a nextflow pipeline that uses conda envs. Simply running nextflow run ./benchmark.nf should reproduce the above reports.

davidsebfischer commented 5 years ago

Thanks @grst for the great problem description, I am looking into this. This is probably linked to us changing the default backend to numpy-based optimizers.

grst commented 5 years ago

Btw, the example data is also available here as tsv files, it's probably easier for you than running the entire pipeline: https://github.com/grst/benchmark-single-cell-de-analysis/tree/master/diffxpy_test

Also, I have the impression that 0.7.1 runs significantly slower than 0.6.13. Is that something you can confirm?

theislab / diffxpy

Results changed from v0.6.13 to 0.7.1 #130

The setup:

Availability