This PR fixes some deprecated code which survived from the time when PyDESeq2 only supported single-factor designs, which caused the number of sample replicates to be evaluated from the last column of the design matrix only, instead of taking all factors into account.
This caused Cooks filtering to give diverging results compared to DESeq2. As an example, the test case introduced in this PR
would previously return a NaN p-value for "gene1", whereas DESeq2 wouldn't.
What does your PR implement? Be specific.
This PR
Implements a function to check how many sample replicates each factor combination has (utils.n_or_more_replicates, cf DESeq2's nOrMoreInCell)
It then corrects deprecated single-factor code by
using n_or_more_replicates in dds._replace_outliers and ds._cooks_filtering to take all factors into account instead of only looking at the last column in the design matrix
likewise, using n_or_more_replicates and grouping samples by unique row combinations in utils.robust_method_of_moments_disp, which is used to calculate cooks distances (cf in DESeq2).
Finally, it implements a new test case in which two outlier counts and a dummy "C" condition with a single replicate are introduced.
This PR fixes some deprecated code which survived from the time when PyDESeq2 only supported single-factor designs, which caused the number of sample replicates to be evaluated from the last column of the design matrix only, instead of taking all factors into account.
This caused Cooks filtering to give diverging results compared to DESeq2. As an example, the test case introduced in this PR
would previously return a
NaN
p-value for"gene1"
, whereas DESeq2 wouldn't.What does your PR implement? Be specific.
This PR
utils.n_or_more_replicates
, cf DESeq2'snOrMoreInCell
)It then corrects deprecated single-factor code by
n_or_more_replicates
indds._replace_outliers
andds._cooks_filtering
to take all factors into account instead of only looking at the last column in the design matrixn_or_more_replicates
and grouping samples by unique row combinations inutils.robust_method_of_moments_disp
, which is used to calculate cooks distances (cf in DESeq2).Finally, it implements a new test case in which two outlier counts and a dummy
"C"
condition with a single replicate are introduced.