owkin / PyDESeq2

A Python implementation of the DESeq2 pipeline for bulk RNA-seq DEA.
https://pydeseq2.readthedocs.io/en/latest/
MIT License
584 stars 62 forks source link

Why is replacement not done if it would result in all zero counts? #238

Closed chubukov closed 8 months ago

chubukov commented 8 months ago

here and here there is a filter for not doing outlier replacement or refitting if the result of the replacement would be all zero counts.

        # Only refit genes for which replacing outliers hasn't resulted in all zeroes
        new_all_zeroes = (self.counts_to_refit.X == 0).all(axis=0)
        # Only replace if genes are not all zeroes after outlier replacement
        to_replace[to_replace] = ~new_all_zeroes

Why is this the right thing to do? If I have zero counts in all but one sample, and a large count in the single outlier sample, why wouldn't the right result be to remove the outlier?

Is this consistent with what the R DESeq2 package does?

BorisMuzellec commented 8 months ago

Hi @chubukov, thanks for pointing this out.

I just tested the example you provide in #237 in the original R DESeq2 package. The resulting behavior is to replace outliers even if it leads to the gene only having zero counts, in which case the baseMean is then set to 0 and all columns in the results dataframe to NaN.