theislab / diffxpy

Differential expression analysis for single-cell RNA-seq data.
https://diffxpy.rtfd.io
BSD 3-Clause "New" or "Revised" License
191 stars 23 forks source link

QUESTION: the p-value for multiple partitions of a data set #168

Open jingxinfu opened 4 years ago

jingxinfu commented 4 years ago

Hi, I was trying to find DEGs between two conditions while controlling the sample-driven effect. Following the tutorial, I used this script to conduct my analysis.

part = de.test.partition(
    data=data_part,
    parts="sample"
)
test_part = part.wald(
    formula_loc="~ 1 + condition",
    factor_loc_totest="condition"
)

Next, I was checking how diffxpy combine p-values from different groups and found this:

        res = pd.DataFrame({
            "gene": self.gene_ids,
            # return minimal pval by gene:
            "pval": np.min(self.pval.reshape(-1, self.pval.shape[-1]), axis=0),
            # return minimal qval by gene:
            "qval": np.min(self.qval.reshape(-1, self.qval.shape[-1]), axis=0),
            # return maximal logFC by gene:
            "log2fc": np.asarray(logfc),
            # return mean expression across all groups by gene:
            "mean": np.asarray(self.mean)
        })

        return res

Would you mind kindly telling me why to choose the minimum p value across groups? I was wondering that it might increase the amount of significant genes in this way. Would other methods, like fisher method https://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.combine_pvalues.html, be better?

davidsebfischer commented 4 years ago

Hi @jingxinfu! thanks for the comment! Two components here are

davidsebfischer commented 4 years ago

I just want to add that fisher method https://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.combine_pvalues.html is not very suitable here: if you want to have a single pvalue per gene, it s much cleaner to write up a GLM that covers all of these tests and tests all of these coefficients in a single test! we could still include it as an indication for where stuff is going on here though.