Inclusion of direction of effect (for coloc) to the L2G evidence in Platform

buniello commented 1 year ago

We would like to include info on direction of effect (already available in the Genetics API for the variant/QTL colocalisation table -- through the QTL beta direction) to the L2G evidence in platform.

[ ] addition of an evidence point in the platform API
[ ] UI visualisation in the genetics Portal evidence table Discussed in the office: This will probably involve addition of a effect on expressioncolumn to the Genetics Portal evidence table (side-to-side to the VEP/functional consequences column). The field should be populated by relevant SO terms (e.g. to capture increased, decreased or altered expression).

This task will facilitate future implementation of the Target Engine project in the platform. @Juanmaria-rr will check numbers re:

L2G evidence without colocalisation with QTLs (%)
number of contradictions across tissues for the same L2G

buniello commented 1 year ago

Hey @Juanmaria-rr I’m using Zenhub in GitHub, click this link to join my workspace and see other features available in GitHub or download the Zenhub extension and sign up with your GitHub account. Posted using Zenhub

buniello commented 1 year ago

From SO: SO:0002315 SO:0002316 SO:0002314

buniello commented 1 year ago

hi @Juanmaria-rr did you make any progress re gathering numbers to scope this task?

L2G evidence without colocalisation with QTLs (%)
number of contradictions across tissues for the same L2G

d0choa commented 1 year ago

I'm having a pass on the data with @ireneisdoomed. We should have a decision on the design today. @DSuveges is also in the loop

ireneisdoomed commented 1 year ago

After discussion, @DSuveges @d0choa and I have come to the conclusion that QTLs give us a significant signal when deriving loss of function, whereas gain of function cases seem to be aberrant - meaning that when we estimate the effect is a gain of function, this is usually wrong. The QTL derived effect comes from the signal with the largest effect size.

The benchmarking has been done against the ChEMBL dataset provided by @Juanmaria-rr, where all disease/target associations are considered to be protective because they are derived from drugs that have an inhibitory effect on the target (meaning that presence of target -> presence of disease).

More notes on the analysis

Code available at: https://gist.github.com/ireneisdoomed/5a452eda3cd7e58b1d0c987f5239fdbe

We have 900k study/locus/gene distinct relationships by looking at regions that colocalise with QTLs (based on Genetics production data). The mechanism that can be established about the effect of the region on the gene/disease:
- lof: 414787 (~45%)
- gof: 403533 (~44%)
- unknown: 94810 (~10%) - this situation accounts for the cases where an effect for the locus is not available in the colocalised study
We can annotate the target/disease OTG evidence dataset with the QTL derived direction of effect of the variant in the gene in 28% of the cases.
- Among these, ~50% are tagged as GoF, ~50% as LoF
This is what I observed when I join the enriched evidence with the gold standard derived from ChEMBL:
- In ChEMBL there are 11.5 more LoF than GoF.
- 250 associations have a coinciding effect with the Genetics results (enriched with coloc).
- For variants that are protective:
  - Most of the LoF are correct - however we only capture ~half of them
  - Most of the GoF are incorrect
- Are there differences in the L2G score?
  - Mean score: 0.55
  - When we are correct: 0.49
  - When we are incorrect: 0.64
- Are there any differences in the effect size?
  - Mean beta (absolute value): 0.57
  - When we are correct: 0.55
  - When we are incorrect: 0.6

		DRUG EFFECT
		LoF	GoF
COLOC EFFECT	LoF	123	3	126
	GoF	107	17	124
		230	20

ireneisdoomed commented 1 year ago

I have prepared a table with the results commented above on the validation set. It essentially describes how QTLs inform about the directionality of t/d associations compared with ChEMBL's, for which the role of the target is known.

https://docs.google.com/spreadsheets/d/1vNt8Yn9og0J1HqVblXIGaN8Wvn5ysmz02fR3rD9EgoU/edit#gid=1143667412

ireneisdoomed commented 1 year ago

We have rerun the analysis thanks to a bug identified by @Juanmaria-rr in the drug mechanisms that we were using as gold standards in which agonists were inappropriately assigned. We considered them as LoF mechanisms when they actually describe that activation of the target is protective against the disease. The bottomline is that the LoF associations were overrepresented in the previous metrics. The LoF/GoF ratio is ~5, and not 11 as we were reporting before.

Having this in place, numbers look much more in favour when we look at the "predictive" validity of the coloc derived mechanism of action.			DRUG EFFECT
		LoF	GoF
COLOC EFFECT	LoF	113	13	126
	GoF	92	53	145
		205	66

As @d0choa pointed out, it is interesting to look at the breakdown when max clinical phase is factored in. With this we see a trend where the higher the clinical validity of an association, the better we are at deriving that mechanism. And this is true on both directions. mechanism_bar

opentargets / issues

Inclusion of direction of effect (for coloc) to the L2G evidence in Platform #2831

More notes on the analysis