owkin / PyDESeq2

A Python implementation of the DESeq2 pipeline for bulk RNA-seq DEA.
https://pydeseq2.readthedocs.io/en/latest/
MIT License
573 stars 60 forks source link

NANs introduced for some genes, no warning, no error [BUG] #291

Closed dbdimitrov closed 2 months ago

dbdimitrov commented 3 months ago

Hi,

Thanks for developing this package and for improving it.

Describe the bug I noticed a bug where while all other stats are calculated, some p-values are assigned to NaN.

To Reproduce small adata: https://drive.google.com/file/d/1wwZDcsEVZD0ldBrtP63vqA1zTPpM05Q2/view?usp=drive_link

code

ctdata = sc.read("ctdata.h5ad")
dds = DeseqDataSet(
    adata=ctdata,
    design_factors=condition_key,
    ref_level=[condition_key, 'ctrl'], # set control as reference
    refit_cooks=True,
    quiet=True
)

dds.deseq2()
stat_res = DeseqStats(dds, contrast=[condition_key, 'stim', 'ctrl'], quiet=True)
stat_res.summary()
stat_res.lfc_shrink(coeff='condition_stim_vs_ctrl')
dea_df = stat_res.results_df

pydeseq2 version:

pydeseq2.__version__
'0.4.4'

Expected behavior Throw a warning? Is it expected to have NaNs here? The counts look like there is some variance, etc.

Desktop (please complete the following information):

Exception Thrown when using the problematic gene alone The first guess on the deviance function returned a nan. This could be a boundary problem and should be reported.

Let me know if I can provide any further info.

BorisMuzellec commented 2 months ago

Hi @dbdimitrov, do you confirm that condition_key is "condition"?

I also get a NaN pvalue for the AAED1 gene. This is a priori not a bug: pydeseq2 filters out p-values based on Cooks outliers (cf the docs).

You can turn this off by setting cooks_filter=False when initialising a DeseqStats object.

That being said, it seems like the ctdata anndata has floats as counts, whereas ints are expected. Converting counts to ints in the piece of code you provided changes the results:

With floats: Capture d’écran 2024-07-01 à 16 41 54

Converting to ints: Capture d’écran 2024-07-01 à 16 42 00

I'll dig a bit more into this but I think that we should raise an error (or at least a warning) in case counts are not ints (and / or maybe cast to int when possible).

dbdimitrov commented 2 months ago

Hi @BorisMuzellec,

Yes, indeed condition_key was "condition" - sorry.

Thanks a lot for the clarification!