opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Extract V2G evidence from functional predictions #2789

Closed ireneisdoomed closed 1 year ago

ireneisdoomed commented 1 year ago

One of the sources for the V2G dataset we have in production is the relationship between a variant and the impact that is predicted to have on the transcript.

This information is predicted by VEP and it is available in the variant annotation dataset that we extract from gnomad. On top of the most severe functional consequence, there is more functional annotation that we think is valuable to display. Therefore the new dataset will include variant/gene information from different angles:

ireneisdoomed commented 1 year ago

Most severe functional consequence

The logic is:

The score is already normalised between 0 and 1.

Example of a V2G evidence:

 geneId                         | ENSG00000285823
 resourceScore                  | null
 datasourceId                   | variantConsequence
 datatypeId                     | vep
 pmid                           | null
 biofeature                     | null
 score                          | 1
 variantId                      | 1_25043903_G_A
 label                          | splice_donor_variant
 variantFunctionalConsequenceId | SO_0001575
 isHighQualityPlof              | null
 chromosome                     | 1
ireneisdoomed commented 1 year ago

Polyphen score

The logic consists of simply parsing the VEP object. The score is already normalised between 0 and 1.

Example of a V2G evidence:

 geneId                         | ENSG00000116675
 resourceScore                  | null
 datasourceId                   | polyphen
 datatypeId                     | vep
 pmid                           | null
 biofeature                     | null
 score                          | 0.005
 variantId                      | 1_65401836_G_A
 label                          | benign
 variantFunctionalConsequenceId | null
 isHighQualityPlof              | null
 chromosome                     | 1
ireneisdoomed commented 1 year ago

SIFT score

The logic consists of simply parsing the VEP object. The score is already normalised between 0 and 1 with the exception that this must be interpreted inversely than Polyphen. That is, the closer the score is to 0, the higher the probability that a substitution is damaging. So under resourceScore we will keep the actual sift score, and under score the inverted one that will feed the aggregated V2G score.

Example of a V2G evidence:

 geneId                         | ENSG00000117724
 resourceScore                  | 1.0
 datasourceId                   | sift
 datatypeId                     | vep
 pmid                           | null
 biofeature                     | null
 score                          | 0.0
 variantId                      | 1_214637901_C_G
 label                          | tolerated
 variantFunctionalConsequenceId | null
 isHighQualityPlof              | null
 chromosome                     | 1
ireneisdoomed commented 1 year ago

pLOF assesment

The logic consists of:

Example of a V2G evidence:

 geneId                         | ENSG00000136536
 resourceScore                  | null
 datasourceId                   | loftee
 datatypeId                     | vep
 pmid                           | null
 biofeature                     | null
 score                          | 0
 variantId                      | 2_159714599_G_A
 label                          | null
 variantFunctionalConsequenceId | null
 isHighQualityPlof              | false
 chromosome                     | 2