martinjzhang / scDRS

Single-cell disease relevance score (scDRS)
https://martinjzhang.github.io/scDRS/
MIT License
109 stars 14 forks source link

scRNA data input #63

Closed frucelee closed 1 year ago

frucelee commented 1 year ago

Hi, thank you so much for the very nice software. I have a pretty simple question about scRNA data input for the software. We have scRNA data with illness and normal conditions, and we could identify the disease-related cell subpopulation using GWAS data. Is it appropriate for this software to infer these interesting subpopulations from scRNA data from normal conditions? It is well understood that using scRNA-seq data from illness condition makes distinguishing the impact of the tissue's original genetic background and the impact of the examined GWAS signals challenging. Or this software allow us to using the scRNA data from normal and disease condition together? How about the power? Thanks a lot. Best, Lee

martinjzhang commented 1 year ago

Hi @frucelee

Thank you for the insightful question. It is statistically appropriate to run scDRS on scRNA-seq data with both case and control cells. We haven't systematically looked into what results to expect. Some preliminary results showed that the disease-relevant cells from high BMI people have slightly higher scDRS disease scores (P<0.05) than those from low BMI people. So running scDRS on disease cells may produce different discoveries. We haven't investigated the power either.

frucelee commented 1 year ago

Super. Thanks a lot. In this case, do I need to put the disease condition (such as normal and disease ones) in the scRNA data as the covariable in the scDRS analysis. Thanks in advance.

frucelee commented 1 year ago

And a last thing, during the scDRS compute-score analysis, we found that the different setting for --n_ctrl can produce distinct results. It seems that lower value of n_ctrl, more "significant" results would be obtained, such as set n_ctrl as 100 but not 1000 . How can we balance it between the parameters of "n_ctrl" and the results? For example, in our dataset, we have 5992 cells. Any suggestions? Thanks in advance.

martinjzhang commented 1 year ago

Super. Thanks a lot. In this case, do I need to put the disease condition (such as normal and disease ones) in the scRNA data as the covariable in the scDRS analysis. Thanks in advance.

Hi @frucelee , if you believe that the disease condition tags strong technical covariates, then it would be good to include it as a covariate. The results will not be drastically different between w/ and w/o the covariate.

martinjzhang commented 1 year ago

And a last thing, during the scDRS compute-score analysis, we found that the different setting for --n_ctrl can produce distinct results. It seems that lower value of n_ctrl, more "significant" results would be obtained, such as set n_ctrl as 100 but not 1000 . How can we balance it between the parameters of "n_ctrl" and the results? For example, in our dataset, we have 5992 cells. Any suggestions? Thanks in advance.

I wouldn't specify an n_ctrl below 500. In our unpublished simulations, scDRS starts to have inflated false positives when n_ctrl is below 500.

frucelee commented 1 year ago

Super. Thanks a lot.