lanagarmire / Asgard

Other
40 stars 18 forks source link

question about the calculation formula of the combined drug score #5

Open lishensuo opened 1 year ago

lishensuo commented 1 year ago

Hi, I have learnd a lot from your package and the article, an awesome method for drug repurposing reseach. But, I am a little confused to understand the combined drug score calculation formula. Here is the question: As the original paper said, you would consider the product of (1) culster ratio; (2) drug fdr pvalue; (3) reversed gene ration for each cell cluster and then sum them. However, I looked into your package code of DrugScore.R script . In line 68: Drug.coverage <- tapply(Drug.list$w.size, Drug.list$Drug,sum), you first likely clalculated the product of (1) culster ratio; (2) drug fdr pvalue for for each cell cluster and then sum them and then sum them. Next,you found the intersected genes among all clusters and corresponidng drug interrupted genes. This is not consistent with the paper method description.

NumðDiseasedGenesÞk is the number of significantly deregulated genes in a cluster k,whileNumðReversedGenesÞk is the number of significantly deregulated genes in a cluster k that can be reversed by the drug.

I don't know if I misunderstood the code or the article. Thank you!

lishensuo commented 1 year ago

I further read your code of ‘DrugScore.R’. I thought it is not reasonable to directly get the overlapped of genes over all cluster degs. Some cell types colud show little transcriptomic change between disease and healthy status. That would cause 0 overlapped genes whic could leading NAN value for Drug.therapeutic.score

hebinghb commented 1 year ago

Hi, Thanks for the detailed review. The cluster ratio is multiplied with negative log10 transformed FDR but not summed. Then the values are summed by the drug. They are just intermediate temporary values for calculating the drug score. The final formulation for the drug score is at the end of this code. You will find all values mentioned in the paper are correctly put into the calculation. The multi-step calculation using intermediate temporary values is for saving computational resources. As to the gene overlap step, it's not overlapping genes over all clusters but for the selected clusters that may be correlated with the disease. If a cluster has no change between the disease and healthy status, it implies this cluster may not contribute to the disease. I would suggest not including it in the calculation of the drug score. Please let me know if you have a further question here or send me an email at hebinghb@gmail.com Thanks!

lishensuo commented 1 year ago

Thank you for your response. But I still think its seemly problematic in your code which is not consistent with your article formula. Like I said above, you select the overlapped gene not the every cluster's deg. When some cluster has not any significant DEG, the intersected genes will be zero which would leading to NAN value for Drug.therapeutic.score For example, The following is the slightly modified code in the README.md (step5)

## (1)the normal condition when execute your code as lots of deg in every cluster
Drug.score<-DrugScore(SC.integrated=SC.integrated,
                     Gene.data=Gene.list,
                     Cell.type=NULL, 
                     Drug.data=Drug.ident.res,
                     FDA.drug.only=TRUE,
                     Case=Case, 
                     Tissue="breast",
                     GSE92742.gctx=GSE92742.gctx.path,
                     GSE70138.gctx=GSE70138.gctx.path)
head(Drug.score)
#             Drug.therapeutic.score   P.value       FDR
# abiraterone           9.056519e-07 0.5554614 1.0000000
# acamprosate           1.175191e-06 0.2618356 0.8065834
# acarbose              7.654915e-07 0.2343568 0.7639510
# acebutolol            1.045812e-06 0.9632485 1.0000000
# aceclidine            1.088939e-06 0.9999216 1.0000000
# aceclofenac           1.056594e-06 0.9958364 1.0000000

## (2) but if one cluster do not have any significant DEG, score will be NAN
Gene.list$C1$adj.P.Val=1
Gene.list$C1$P.Value=1
Drug.score2<-DrugScore(SC.integrated=SC.integrated,
                     Gene.data=Gene.list,
                     Cell.type=NULL, 
                     Drug.data=Drug.ident.res,
                     FDA.drug.only=TRUE,
                     Case=Case, 
                     Tissue="breast",
                     GSE92742.gctx=GSE92742.gctx.path,
                     GSE70138.gctx=GSE70138.gctx.path)
head(Drug.score2)
#             Drug.therapeutic.score   P.value       FDR
# abiraterone                    NaN 0.5554614 1.0000000
# acamprosate                    NaN 0.2618356 0.8065834
# acarbose                       NaN 0.2343568 0.7639510
# acebutolol                     NaN 0.9632485 1.0000000
# aceclidine                     NaN 0.9999216 1.0000000
# aceclofenac                    NaN 0.9958364 1.0000000
hebinghb commented 1 year ago

Hi there, as I replied above, the code is consistent with the formula. Regarding the next question, you may revise the code to skip this step if you want to repurpose a drug for a cluster that doesn't have differential genes between the disease and healthy sample. Please let me know if you need help.

lishensuo commented 1 year ago

Thanks, I would study your code carefully. May I ask if it means that if on conseverd cluster without degs exsit , the combined Drug.therapeutic.score cannot be calculated?

hebinghb commented 1 year ago

This method is repurposing drugs that reverse the differential expressions in the disease. If there's no differential expression, there is no need for a drug score.

lishensuo commented 1 year ago

Sorry, I may did not speak clearly. The combined score is based on several clusters' results as your method suggested. However, if one of the clusters is conserved without deg, the combined score will be NAN as the above modified code?

hebinghb commented 1 year ago

Same thing. The method is finding drugs that affect ALL the clusters you added. But it's not a simple sum of drug scores from every selected cluster. If you put in a "NAN" cluster without DEG, it indeed will report a NAN drug score. Because it finds there is no drug can affect ALL the clusters it received. I still suggest not adding a cluster without DEG.

lishensuo commented 1 year ago

Thanks for your patient answer, I seem to understand some. I will read the article and your response again. Thanks again!