Closed ireneisdoomed closed 1 year ago
Most severe functional consequence
The logic is:
gs://genetics-portal-data/lut/vep_consequences.tsv
The score is already normalised between 0 and 1.
Example of a V2G evidence:
geneId | ENSG00000285823
resourceScore | null
datasourceId | variantConsequence
datatypeId | vep
pmid | null
biofeature | null
score | 1
variantId | 1_25043903_G_A
label | splice_donor_variant
variantFunctionalConsequenceId | SO_0001575
isHighQualityPlof | null
chromosome | 1
Polyphen score
The logic consists of simply parsing the VEP object. The score is already normalised between 0 and 1.
Example of a V2G evidence:
geneId | ENSG00000116675
resourceScore | null
datasourceId | polyphen
datatypeId | vep
pmid | null
biofeature | null
score | 0.005
variantId | 1_65401836_G_A
label | benign
variantFunctionalConsequenceId | null
isHighQualityPlof | null
chromosome | 1
SIFT score
The logic consists of simply parsing the VEP object.
The score is already normalised between 0 and 1 with the exception that this must be interpreted inversely than Polyphen. That is, the closer the score is to 0, the higher the probability that a substitution is damaging.
So under resourceScore
we will keep the actual sift score, and under score
the inverted one that will feed the aggregated V2G score.
Example of a V2G evidence:
geneId | ENSG00000117724
resourceScore | 1.0
datasourceId | sift
datatypeId | vep
pmid | null
biofeature | null
score | 0.0
variantId | 1_214637901_C_G
label | tolerated
variantFunctionalConsequenceId | null
isHighQualityPlof | null
chromosome | 1
pLOF assesment
The logic consists of:
Example of a V2G evidence:
geneId | ENSG00000136536
resourceScore | null
datasourceId | loftee
datatypeId | vep
pmid | null
biofeature | null
score | 0
variantId | 2_159714599_G_A
label | null
variantFunctionalConsequenceId | null
isHighQualityPlof | false
chromosome | 2
One of the sources for the V2G dataset we have in production is the relationship between a variant and the impact that is predicted to have on the transcript.
This information is predicted by VEP and it is available in the
variant annotation
dataset that we extract from gnomad. On top of the most severe functional consequence, there is more functional annotation that we think is valuable to display. Therefore the new dataset will include variant/gene information from different angles: