martinjzhang / scDRS

Single-cell disease relevance score (scDRS)
https://martinjzhang.github.io/scDRS/
MIT License
105 stars 13 forks source link

Issue32 #35

Closed martinjzhang closed 1 year ago

martinjzhang commented 1 year ago
  1. Remove genes with ct_mean<0 before calling loess, addressing https://github.com/martinjzhang/scDRS/issues/32#issuecomment-1263006523
  2. Print info for --adj-prop
martinjzhang commented 1 year ago

ct_mean should by definition >= 0 because count should be >= 0. Is this because of the numerical issue? What do you think about adding a small constant like 1e-6?

This issue occurred when both --adj-prop and --cov are on and there are genes with extremely low expression. Since we have included a constant term in the regression, the transformed data transformed_X = adata.X + COV_MAT * COV_BETA + COV_GENE_MEAN has the same gene-level mean as the original data, i.e., COV_GENE_MEAN. However, when --adj-prop is on, ct_mean corresponds to the weighted average by the reciprocal of cell type proportions. The weighted mean of transformed_X may be slightly different from COV_GENE_MEAN. This makes the ct_mean of the transformed data slightly different from that of the original data, which is strictly non-negative.

This issue is not pervasive. In the PBMC3K example, there are 46 genes with negative ct_mean. The magnitudes are around 1e-4, 2 orders smaller than the median ct_mean. So I think removing them should be fine.