Closed ireneisdoomed closed 3 months ago
EVA will update the Pharmacogenetics dataset they provide to include star allele/gene associations so that we can display them in the widget. Our next meeting with them is on Jan 22nd.
For reference, the genes of the current pharmacogenetics dataset are the ones assigned by VEP. So the implicated variant requires to be normalised so that VEP can consume it. This is not possible for star alleles, however PharmGKB does have gene information. The intention is to show these associations as is, without the extra annotation from VEP.
Useful issue tracking the investigations made by @apriltuesday: https://github.com/EBIvariation/opentargets-pharmgkb/issues/28
After the discussion with EVA today, we have agreed that we want to maintain the star allele notation because it's well established in the field.
This means that we need to introduce a new ID field in the schema to accommodate this. I propose haplotypeId
.
In the end, we'll have:
haplotypeId
: for example CYP2C9*3, the name of the haplotype that comprises multiple variants.variantRId
: one of the variants in that haplotype, for example rs1057910. This information will come from PharmGKB, so that we can extract the most severe consequence.genotypeId
: each of the possible alleles in the variantId, for example 10_94981296_A_CHello! Some thoughts on these questions (and one question of my own)...
PA165816542
, from which you can construct the URL), and this ID isn't in the data tables I'm currently using. I will look into it and keep you posted.variantRsId
and genotypeId
can (actually must) be left blank here, we just need to remove the variant ID from the required fields in the schema...alleleFunction
?On a separate note, for the genotype
field - we currently fill this with the first column in PGKB's annotation table (rsid example, star example), which is labelled Allele
but can be either allele or genotype. We can continue to do this for named alleles, as in both cases it would document the raw text that we use to create the genotype OR haplotype ID, or we could do something else. Do you guys use this field at all? Do you have a preference?
Hi @emcdonagh-OT @ireneisdoomed @DSuveges - Getting the information to link to a haplotype page from the haplotypeId
will not be possible in general, as sometimes the annotation is actually for a genotype and not a haplotype at all... (example here).
However it should be achievable for the majority of cases, so perhaps can be added as a non-required field. This would be PGKB's internal identifier for the haplotype - for example (field name pending):
{
...
"haplotypeId": "CYP2C9*1",
"internalHaplotypeId": "PA165816542"
...
}
from which you can get the URL as https://www.pharmgkb.org/haplotype/PA165816542
.
Does this seem reasonable?
@apriltuesday Apologies for the late reply, I'd missed this.
That is absolutely reasonable, we expect that haplotypes won't be available for many evidence.
Using this ID to link back to PharmGKB is a nice to have, but not essential since we already link the whole evidence. Is it complex to extract?
As for the field name, I'd suggest haplotypeFromSourceId
. Let me know if this is OK and we'll update our schema.
@ireneisdoomed thanks, haplotypeFromSourceId
sounds good, it's not complicated to extract for the majority of cases so should be fine.
The other schema changes I think are needed are:
variantRsId
needs to be not required - you could make it so that either variantRsId
or haplotypeId
are required thoughalleleFunction
(or another name) to be added - if you want the "decreased function" etc. annotation to be included in this submissionSorry for the delay @apriltuesday We have a new schema version under the tag 2.6.1 with the changes you have requested:
variantRsId
is now optional (all IDs are now optional)haplotypeFromSourceId
will contain PGKB's haplotype ID to link todirectionality
contains the allele function annotationAnd one more:
drugFromSourceId
will contain the CHEBI
identifiers. This used to be stored in drugId
, but we want to leave this field blank. Is this possible?Let me know if you have any issues, thank you so much!
Great, thank you! I'll make the necessary changes and let you know any issues.
The latest EVA submission (cttv012-2024-02-02_pgkb.json.gz
) contain the evidence from the haplotypes in the table above.
Gene | Chromosome | Position | Reference | Alternative | RsID | Legacy Name | Alleles | Star allele | Key drug association | In OT 23.12 | In OT 24.02 |
---|---|---|---|---|---|---|---|---|---|---|---|
DPYD | 1 | 97450058 | C | T | rs3918290 | c.1905+1G>A | *2A | *2A | capecitabine/fluorouracil | Yes | Yes |
DPYD | 1 | 97515787 | A | C | rs55886062 | c.1679T>G | *13 | *13 | capecitabine/fluorouracil | Yes | Yes |
DPYD | 1 | 97573863 | C | T | rs56038477 | c.1236G>A | HapB3 | HapB3 | capecitabine/fluorouracil | Yes | Yes |
DPYD | 1 | 97579893 | G | C | rs75017182 | c.1129-5923C>G | HapB3 | HapB3 | capecitabine/fluorouracil | Yes | Yes |
UGT1A1 | 2 | 233760233 | C | CAT | rs3064744 | NC_000002.11:g.234668881TA[6]>TA[7] | *28 | *28 | Irinotecan | No | Yes (1451206982) |
UGT1A1 | 2 | 233760498 | G | A | rs4148323 | 211G>A | *6 | *6 | Irinotecan | No | Yes (1451329460) |
TPMT | 6 | 18130687 | T | C | rs1142345 | 719A>G | *3C | 3A/3C | thiopurine/mercaptopurine/azathioprine | No | Yes (1451237326) |
TPMT | 6 | 18130781 | C | T | rs1800584 | 626-1G>A | *4 | *4 | thiopurine/mercaptopurine/azathioprine | No | Yes (1184648909) |
TPMT | 6 | 18138997 | C | T | rs1800460 | 460G>A | *3B | 3A/3B | thiopurine/mercaptopurine/azathioprine | No | Yes (1451237326) |
CYP2D6 | rs3892097 | *4 | codeine | No | Yes (1183616718) | ||||||
CYP2C19 | rs28399504 | *4 | clopidogrel | No | Yes (1043858794) | ||||||
HLA | any variants? | abacavir/allopurinol | No | Yes (981419257/981419260) | |||||||
CYP2C9 | rs1799853 | *2 | phenytoin/warfarin | No | Yes (1447672988/982047500) |
This is an example of how CYP2C9*2 and warfarin is represented:
-RECORD 0----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
datasourceId | pharmgkb
datasourceVersion | 2024-01-05
datatypeId | clinical_annotation
directionality | Decreased function
drugFromSource | warfarin
drugFromSourceId | CHEBI_10033
evidenceLevel | 1A
genotype | *2
genotypeAnnotationText | The CYP2C9*2 allele is assigned as a decreased function allele by CPIC. Patients with CYP2C9*2 in combination with another normal function allele, a decreased function allele, or a no function allele may have increased risk of bleeding when treated with warfarin as compared to patients with two normal function alleles. However, conflicting evidence has been reported. Other genetic and clinical factors may also influence the risk of warfarin-induced bleeding.
genotypeId | NULL
haplotypeFromSourceId | PA165816543
haplotypeId | CYP2C9*2
literature | [29432897, 10509530, 31186542, 29432897, 17653141, 17653141, 17653141, 17653141, 17653141, 18756910, 24503627, 24503627, 11926893, 10509530, 18574025, 24503627, 24503627, 24474498, 25001883, 25001883, 23932037, 23932037, 23774101, 23104259, 23104259, 25521356, 25521356, 24602049, 25769357, 25858232, 14676821, 17955230, 18690342, 22571356, 19297219, 23602689, 21148049, 25244877, 15714076, 23348161, 18570163, 26777610, 27488176, 27581200, 28033245, 28033245, 28689179]
pgxCategory | toxicity
phenotypeFromSourceId | MP_0001914
phenotypeText | Hemorrhage
studyId | 1447672988
targetFromSourceId | ENSG00000138109
variantFunctionalConsequenceId | NULL
variantRsId | NULL
Unless there are any comments, I consider this work done. Thanks again @apriltuesday for your effort!
As a developer I want to integrate and harmonise star allele data from PharmGKB into the Platform because these are clinically relevant associations that our users expect to see in our widgets.
Background
Star alleles are a widely used notation within the pharmacogenetics field. For 23.12 we won't show this information because of the complexities of standardising this notation to our variant IDs. @apriltuesday has explored its complexities in this notebook. For the next iteration, we want to integrate them as part of the pharmacogenetics dataset.
Tasks
Acceptance tests
Ellie has shared with us some key drug associations that we should use to validate the data. As it can be seen, most of the key associations using star alleles are lost (see column In OT).
Context
Metrics extracted by @apriltuesday and based on the 2023-10-05 data dump: