Open BorisMuzellec opened 1 year ago
Just wanted to add on that I encountered this error when trying to run on a sparse matrix. When I densify it it is fine, but I am sure that users with large data matrices will appreciate being able to run without having to densify their matrices.
Hopefully this gets implemented soon.
Hi, I'm also noticing that the default functionality breaks when the input data isn't densified ahead of time -- the internal validation functions assume that the input counts are dense numpy / pandas objects, despite the default AnnData behavior recasting these inputs into sparse matrices.
I'm not exactly sure how the tutorials on the main website are able to run in the first place -- I have not been able to run any of these tutorials (with new data) without recasting the via
from pydeseq2.dds import DeseqDataSet
dds = DeseqDataSet(counts=df, metadata=md)
dds.X = np.array(dds.X.todense())
Currently, PyDESeq2 throws an error when trying to initialise a
DeseqDataSet
with a count matrix that contains NaNs – this is to reproduce DESeq2's behaviour.As pointed out by @arthurPignetOwkin, it seems like it would make sense to simply raise a warning instead and carry on with the analysis, and return NaNs for dispersions, LFCs, and p-values of genes that have NaN counts (as we already do for genes whose counts are all-zero).