Issue32 - Githubissues

ct_mean should by definition >= 0 because count should be >= 0. Is this because of the numerical issue? What do you think about adding a small constant like 1e-6?

This issue occurred when both --adj-prop and --cov are on and there are genes with extremely low expression. Since we have included a constant term in the regression, the transformed data transformed_X = adata.X + COV_MAT * COV_BETA + COV_GENE_MEAN has the same gene-level mean as the original data, i.e., COV_GENE_MEAN. However, when --adj-prop is on, ct_mean corresponds to the weighted average by the reciprocal of cell type proportions. The weighted mean of transformed_X may be slightly different from COV_GENE_MEAN. This makes the ct_mean of the transformed data slightly different from that of the original data, which is strictly non-negative.

This issue is not pervasive. In the PBMC3K example, there are 46 genes with negative ct_mean. The magnitudes are around 1e-4, 2 orders smaller than the median ct_mean. So I think removing them should be fine.

martinjzhang / scDRS

Issue32 #35