Closed martinjzhang closed 1 year ago
ct_mean
should by definition >= 0 because count should be >= 0. Is this because of the numerical issue? What do you think about adding a small constant like 1e-6?
This issue occurred when both --adj-prop
and --cov
are on and there are genes with extremely low expression. Since we have included a constant term in the regression, the transformed data transformed_X = adata.X + COV_MAT * COV_BETA + COV_GENE_MEAN
has the same gene-level mean as the original data, i.e., COV_GENE_MEAN
. However, when --adj-prop
is on, ct_mean
corresponds to the weighted average by the reciprocal of cell type proportions. The weighted mean of transformed_X
may be slightly different from COV_GENE_MEAN
. This makes the ct_mean
of the transformed data slightly different from that of the original data, which is strictly non-negative.
This issue is not pervasive. In the PBMC3K example, there are 46 genes with negative ct_mean
. The magnitudes are around 1e-4
, 2 orders smaller than the median ct_mean
. So I think removing them should be fine.
--adj-prop