Closed buniello closed 1 year ago
Hey @Juanmaria-rr I’m using Zenhub in GitHub, click this link to join my workspace and see other features available in GitHub or download the Zenhub extension and sign up with your GitHub account. Posted using Zenhub
From SO: SO:0002315 SO:0002316 SO:0002314
hi @Juanmaria-rr did you make any progress re gathering numbers to scope this task?
I'm having a pass on the data with @ireneisdoomed. We should have a decision on the design today. @DSuveges is also in the loop
After discussion, @DSuveges @d0choa and I have come to the conclusion that QTLs give us a significant signal when deriving loss of function, whereas gain of function cases seem to be aberrant - meaning that when we estimate the effect is a gain of function, this is usually wrong. The QTL derived effect comes from the signal with the largest effect size.
The benchmarking has been done against the ChEMBL dataset provided by @Juanmaria-rr, where all disease/target associations are considered to be protective because they are derived from drugs that have an inhibitory effect on the target (meaning that presence of target -> presence of disease).
Code available at: https://gist.github.com/ireneisdoomed/5a452eda3cd7e58b1d0c987f5239fdbe
We have 900k study/locus/gene distinct relationships by looking at regions that colocalise with QTLs (based on Genetics production data). The mechanism that can be established about the effect of the region on the gene/disease:
We can annotate the target/disease OTG evidence dataset with the QTL derived direction of effect of the variant in the gene in 28% of the cases.
This is what I observed when I join the enriched evidence with the gold standard derived from ChEMBL:
DRUG EFFECT | ||||
---|---|---|---|---|
LoF | GoF | |||
COLOC EFFECT | LoF | 123 | 3 | 126 |
GoF | 107 | 17 | 124 | |
230 | 20 |
I have prepared a table with the results commented above on the validation set. It essentially describes how QTLs inform about the directionality of t/d associations compared with ChEMBL's, for which the role of the target is known.
We have rerun the analysis thanks to a bug identified by @Juanmaria-rr in the drug mechanisms that we were using as gold standards in which agonists were inappropriately assigned. We considered them as LoF mechanisms when they actually describe that activation of the target is protective against the disease. The bottomline is that the LoF associations were overrepresented in the previous metrics. The LoF/GoF ratio is ~5, and not 11 as we were reporting before.
Having this in place, numbers look much more in favour when we look at the "predictive" validity of the coloc derived mechanism of action. | DRUG EFFECT | ||||
---|---|---|---|---|---|
LoF | GoF | ||||
COLOC EFFECT | LoF | 113 | 13 | 126 | |
GoF | 92 | 53 | 145 | ||
205 | 66 |
As @d0choa pointed out, it is interesting to look at the breakdown when max clinical phase is factored in. With this we see a trend where the higher the clinical validity of an association, the better we are at deriving that mechanism. And this is true on both directions.
We would like to include info on direction of effect (already available in the Genetics API for the variant/QTL colocalisation table -- through the
QTL beta
direction) to the L2G evidence in platform.effect on expression
column to the Genetics Portal evidence table (side-to-side to the VEP/functional consequences column). The field should be populated by relevant SO terms (e.g. to capture increased, decreased or altered expression).This task will facilitate future implementation of the Target Engine project in the platform. @Juanmaria-rr will check numbers re: