martinjzhang / scDRS

Single-cell disease relevance score (scDRS)
https://martinjzhang.github.io/scDRS/
MIT License
98 stars 11 forks source link

gene weights for the control genes #69

Closed aandreas13 closed 10 months ago

aandreas13 commented 10 months ago

Hi!

Thanks a lot for making this wonderful tool. I am currently studying how exactly scDRS works. When calculating for raw control score, it appears in the formula stated in the scDRS paper w_g (stands for MAGMA gene weight). However, where does this control gene weight come from? Because from what I understand is that the input gs file is the file containing the disease gene-set with its corresponding p-value or z-score, but we never provide the gene weight for the control genes. Then how can the control genes have MAGMA gene weights as well?

Thanks

martinjzhang commented 10 months ago

Control genes use the same weights as their corresponding disease genes. We were very careful about this with the notation. Please see Eq.1

Each disease gene uses its own MAGMA weight w_g.

For each control gene g, let the corresponding disease gene be \pi(g). The control gene g uses the MAGMA weight of the corresponding disease gene \pi(g)

aandreas13 commented 10 months ago

Thanks for clarifying. That's exactly what I thought as well, but just want to confirm if it's really the case.

In Eq.1 in the paper, for control score (s_cb_ctrl), g refers to the member of a set called G_b_ctrl, which is not set G (disease gene set), so when the equation refers to w_g for control score, it suppose to mean MAGMA gene score for control gene g which the we don't have, because we only provide the MAGMA disease gene score. Additionally, I suppose that sigma_tech,g in the control score suppose to mean the technical noise for control gene g, right? not the corresponding disease gene technical noise. If it's the case, then I hope you can understand where my confusion comes from, because the subscript g here means 2 different thing for 2 different variable.

martinjzhang commented 10 months ago

Hi @aandreas13 ,

The equation uses $w_{\pi(g)}$ instead of $w_g$ for the control score part. $g$ and $\pi(g)$ are two different genes.

I suppose that sigma_tech,g in the control score suppose to mean the technical noise for control gene g, right?

That's right

aandreas13 commented 10 months ago

Oh great to know that! Btw I was reading the paper from this bioarxiv, and the control score is shown below image This probably is not the latest version, is it?

martinjzhang commented 10 months ago

No. We fixed this typo in the final version on NG.