Open jdhayhurst opened 4 months ago
Anything in dbXrefs
can be consumed as synonym:
One example (10_114045297_G_C) from the API response:
"dbXrefs": [
{
"id": "rs1801253",
"source": "ensemblVariation"
},
{
"id": "109630#0001",
"source": "omim"
},
{
"id": "VCV000017746",
"source": "clinVar"
},
{
"id": "10_114045297_G_C",
"source": "protVar"
},
{
"id": "10-114045297-G-C",
"source": "gnomad"
}
],
I have created the variant search index based on the data here: gs://genetics_etl_python_playground/ds_vep_0720/variant_index.json/
Variants are linked to targets via the ranking in their transcriptConsequences.transcriptIndex
- and taking the top 3. We then take the top 3 diseases for those targets from the indirect associations and add those disease labels to the variant index (NOTE that we should probably change the way we link from V -> D).
A relevance score is calculated using transcriptConsequences.consequenceScore
+ 1 and multiplying by the target-disease association score.
TODO: add the variants to the disease and and target search indices so that when you search for a variant, you will also return targets and diseases.
To extend the current search request to the API to include the variant search index, it would like the follow:
# Write your query or mutation here
query SearchQuery($queryString: String!) {
topHit: search(
queryString: $queryString
entityNames: ["target", "disease", "drug", "variant"]
page: {index: 0, size: 1}
) {
hits {
id
entity
score
object {
... on VariantIndex {
variantId
rsIds
__typename
}
... on Target {
id
approvedSymbol
approvedName
functionDescriptions
__typename
}
... on Disease {
id
name
description
__typename
}
... on Drug {
id
name
description
mechanismsOfAction {
rows {
mechanismOfAction
__typename
}
__typename
}
__typename
}
__typename
}
__typename
}
__typename
}
variants: search(
queryString: $queryString
entityNames: ["variant"]
page: {index: 0, size: 3}
) {
hits {
id
entity
score
object {
... on VariantIndex {
variantId
__typename
}
__typename
}
__typename
}
__typename
}
targets: search(
queryString: $queryString
entityNames: ["target"]
page: {index: 0, size: 3}
) {
hits {
id
entity
object {
... on Target {
id
approvedSymbol
approvedName
__typename
}
__typename
}
__typename
}
__typename
}
diseases: search(
queryString: $queryString
entityNames: ["disease"]
page: {index: 0, size: 3}
) {
hits {
id
entity
object {
... on Disease {
id
name
__typename
}
__typename
}
__typename
}
__typename
}
drugs: search(
queryString: $queryString
entityNames: ["drug"]
page: {index: 0, size: 3}
) {
hits {
id
entity
object {
... on Drug {
id
name
__typename
}
__typename
}
__typename
}
__typename
}
}
This is using the VariantIndex
type for variants, but we have discussed changing this to Variant
so that's something to bear in mind.
@carcruz and FE team feel free to start exploring this dataset from your side (as discussed in the team leads meeting)
The current plan after the variant page meeting is that the disease and target labels should not be added to the variant search index. However, the variants should be added to the target search index.
disease and target labels should not be added to the variant search index. However, the variants should be added to the target search index.
So it means, when searching for a variant, we'll get targets and diseases suggested, but no variants are suggested if searched for a diseases or targets? I very much agree with this call.
Yes (except searching for variants only yields targets no diseases - yet anyway).
Search for variant
-> variant
& target
Search for target
| disease
| drug
-> target
& disease
& drug
As discussed this week, @gjmcn and @chinmehta will start building a FE feature on top of the current API & while james is on holidays. tagging @prashantuniyal02 for reference in the next two weeks.
May be out of scope for the MVP, but it would be great to have the ability to search by protein coordinates, as well as genomic, as part of the variant synonyms as for coding variants this is how people will recognise/remember them and want to look them up. This may already be covered but couldn't see reference above.
Some examples: rs121913530: KRASG12C, NP_203524.1:p.Gly12Cys rs186045772: CFTRF1074L, NP_000483.3:p.Phe1074Leu
The ProtVar team (contact: James Stephenson) has different kinds of mappings for genome > protein which may be useful.
This will also be relevant for pharmacogenetic star allele variants where we would want users to be able to search by the star allele. Examples: CYP2D6 2, CYP2D6 1xN, CYP2D6 35 Seen in the pharmacogenetic widget here - https://platform.opentargets.org/target/ENSG00000100197 Currently we link to mapping information on PharmGKB (e.g. https://www.pharmgkb.org/haplotype/PA165816577)
As discussed in Aug 7th...
TODO add two new variant identifiers. When the amino acid change is given:
<approvedSymbol>_<aa∆>
<uniprot_accession>_<aa∆>
We need a enable the search of variants in the OpenTargets platform search.
Background
It should be possible to search on
Data should come from the variant output from gentropy: https://github.com/opentargets/issues/issues/3350
what happens when you search?
should we return top genes - yes this can be pre-computed and weighted from transcriptConsequences.distance and transcriptConsequences.targetId.
Tasks
Acceptance tests
How do we know the task is complete?