opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Harmonise pharmacogenetics associations that refer to star alleles #3164

Closed ireneisdoomed closed 3 months ago

ireneisdoomed commented 7 months ago

As a developer I want to integrate and harmonise star allele data from PharmGKB into the Platform because these are clinically relevant associations that our users expect to see in our widgets.

Background

Star alleles are a widely used notation within the pharmacogenetics field. For 23.12 we won't show this information because of the complexities of standardising this notation to our variant IDs. @apriltuesday has explored its complexities in this notebook. For the next iteration, we want to integrate them as part of the pharmacogenetics dataset.

Tasks

Acceptance tests

Ellie has shared with us some key drug associations that we should use to validate the data. As it can be seen, most of the key associations using star alleles are lost (see column In OT).

Gene Chromosome Position Reference Alternative RsID Legacy Name Alleles Star allele Key drug association In OT
DPYD 1 97450058 C T rs3918290 c.1905+1G>A *2A *2A capecitabine/fluorouracil Yes
DPYD 1 97515787 A C rs55886062 c.1679T>G *13 *13 capecitabine/fluorouracil Yes
DPYD 1 97573863 C T rs56038477 c.1236G>A HapB3 HapB3 capecitabine/fluorouracil Yes
DPYD 1 97579893 G C rs75017182 c.1129-5923C>G HapB3 HapB3 capecitabine/fluorouracil Yes
UGT1A1 2 233760233 C CAT rs3064744 NC_000002.11:g.234668881TA[6]>TA[7] *28 *28 Irinotecan No
UGT1A1 2 233760498 G A rs4148323 211G>A *6 *6 Irinotecan No
TPMT 6 18130687 T C rs1142345 719A>G *3C 3A/3C thiopurine/mercaptopurine/azathioprine No
TPMT 6 18130781 C T rs1800584 626-1G>A *4 *4 thiopurine/mercaptopurine/azathioprine No
TPMT 6 18138997 C T rs1800460 460G>A *3B 3A/3B thiopurine/mercaptopurine/azathioprine No
CYP2D6 rs3892097 *4 codeine No
CYP2C19 rs28399504 *4 clopidogrel No
HLA any variants? abacavir/allopurinol No
CYP2C9 rs1799853 *2 phenytoin/warfarin No

Context

Metrics extracted by @apriltuesday and based on the 2023-10-05 data dump:

Total clinical annotations: 5073
        With RS: 4477 (88.25%)
                1. Exploded by allele: 13497 (3.0x)
                2. Exploded by drug: 18830 (1.4x)
                3. Exploded by phenotype: 23086 (1.2x)
Total evidence strings: 24641
        With CHEBI: 20451 (83.00%)
        With EFO phenotype: 8139 (33.03%)
        With functional consequence: 15633 (63.44%)
        With VEP gene: 15633 (63.44%)   
Gene comparisons per annotation
        With PGKB genes: 4220 (83.19%)  
        With VEP genes: 4099 (80.80%)   
        PGKB genes != VEP genes: 770 (15.18%)
Total RS: 2794
        With parsed alleles: 2771 (99.18%)
                With >2 alleles: 31 (1.12%)
ireneisdoomed commented 5 months ago

EVA will update the Pharmacogenetics dataset they provide to include star allele/gene associations so that we can display them in the widget. Our next meeting with them is on Jan 22nd.

For reference, the genes of the current pharmacogenetics dataset are the ones assigned by VEP. So the implicated variant requires to be normalised so that VEP can consume it. This is not possible for star alleles, however PharmGKB does have gene information. The intention is to show these associations as is, without the extra annotation from VEP.

ireneisdoomed commented 5 months ago

Useful issue tracking the investigations made by @apriltuesday: https://github.com/EBIvariation/opentargets-pharmgkb/issues/28

ireneisdoomed commented 5 months ago

After the discussion with EVA today, we have agreed that we want to maintain the star allele notation because it's well established in the field. This means that we need to introduce a new ID field in the schema to accommodate this. I propose haplotypeId. In the end, we'll have:

emcdonagh-OT commented 5 months ago
apriltuesday commented 5 months ago

Hello! Some thoughts on these questions (and one question of my own)...

On a separate note, for the genotype field - we currently fill this with the first column in PGKB's annotation table (rsid example, star example), which is labelled Allele but can be either allele or genotype. We can continue to do this for named alleles, as in both cases it would document the raw text that we use to create the genotype OR haplotype ID, or we could do something else. Do you guys use this field at all? Do you have a preference?

apriltuesday commented 5 months ago

Hi @emcdonagh-OT @ireneisdoomed @DSuveges - Getting the information to link to a haplotype page from the haplotypeId will not be possible in general, as sometimes the annotation is actually for a genotype and not a haplotype at all... (example here).

However it should be achievable for the majority of cases, so perhaps can be added as a non-required field. This would be PGKB's internal identifier for the haplotype - for example (field name pending):

{
    ...
    "haplotypeId": "CYP2C9*1",
    "internalHaplotypeId": "PA165816542"
    ...
}

from which you can get the URL as https://www.pharmgkb.org/haplotype/PA165816542.

Does this seem reasonable?

ireneisdoomed commented 5 months ago

@apriltuesday Apologies for the late reply, I'd missed this. That is absolutely reasonable, we expect that haplotypes won't be available for many evidence. Using this ID to link back to PharmGKB is a nice to have, but not essential since we already link the whole evidence. Is it complex to extract? As for the field name, I'd suggest haplotypeFromSourceId. Let me know if this is OK and we'll update our schema.

apriltuesday commented 5 months ago

@ireneisdoomed thanks, haplotypeFromSourceId sounds good, it's not complicated to extract for the majority of cases so should be fine.

The other schema changes I think are needed are:

ireneisdoomed commented 5 months ago

Sorry for the delay @apriltuesday We have a new schema version under the tag 2.6.1 with the changes you have requested:

And one more:

Let me know if you have any issues, thank you so much!

apriltuesday commented 5 months ago

Great, thank you! I'll make the necessary changes and let you know any issues.

ireneisdoomed commented 5 months ago

The latest EVA submission (cttv012-2024-02-02_pgkb.json.gz) contain the evidence from the haplotypes in the table above.

Gene Chromosome Position Reference Alternative RsID Legacy Name Alleles Star allele Key drug association In OT 23.12 In OT 24.02
DPYD 1 97450058 C T rs3918290 c.1905+1G>A *2A *2A capecitabine/fluorouracil Yes Yes
DPYD 1 97515787 A C rs55886062 c.1679T>G *13 *13 capecitabine/fluorouracil Yes Yes
DPYD 1 97573863 C T rs56038477 c.1236G>A HapB3 HapB3 capecitabine/fluorouracil Yes Yes
DPYD 1 97579893 G C rs75017182 c.1129-5923C>G HapB3 HapB3 capecitabine/fluorouracil Yes Yes
UGT1A1 2 233760233 C CAT rs3064744 NC_000002.11:g.234668881TA[6]>TA[7] *28 *28 Irinotecan No Yes (1451206982)
UGT1A1 2 233760498 G A rs4148323 211G>A *6 *6 Irinotecan No Yes (1451329460)
TPMT 6 18130687 T C rs1142345 719A>G *3C 3A/3C thiopurine/mercaptopurine/azathioprine No Yes (1451237326)
TPMT 6 18130781 C T rs1800584 626-1G>A *4 *4 thiopurine/mercaptopurine/azathioprine No Yes (1184648909)
TPMT 6 18138997 C T rs1800460 460G>A *3B 3A/3B thiopurine/mercaptopurine/azathioprine No Yes (1451237326)
CYP2D6 rs3892097 *4 codeine No Yes (1183616718)
CYP2C19 rs28399504 *4 clopidogrel No Yes (1043858794)
HLA any variants? abacavir/allopurinol No Yes (981419257/981419260)
CYP2C9 rs1799853 *2 phenytoin/warfarin No Yes (1447672988/982047500)

This is an example of how CYP2C9*2 and warfarin is represented:

-RECORD 0----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 datasourceId                   | pharmgkb                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
 datasourceVersion              | 2024-01-05                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
 datatypeId                     | clinical_annotation                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
 directionality                 | Decreased function                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
 drugFromSource                 | warfarin                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
 drugFromSourceId               | CHEBI_10033                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
 evidenceLevel                  | 1A                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
 genotype                       | *2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
 genotypeAnnotationText         | The CYP2C9*2 allele is assigned as a decreased function allele by CPIC. Patients with CYP2C9*2 in combination with another normal function allele, a decreased function allele, or a no function allele may have increased risk of bleeding when treated with warfarin as compared to patients with two normal function alleles. However, conflicting evidence has been reported. Other genetic and clinical factors may also influence the risk of warfarin-induced bleeding.         
 genotypeId                     | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
 haplotypeFromSourceId          | PA165816543                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
 haplotypeId                    | CYP2C9*2                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
 literature                     | [29432897, 10509530, 31186542, 29432897, 17653141, 17653141, 17653141, 17653141, 17653141, 18756910, 24503627, 24503627, 11926893, 10509530, 18574025, 24503627, 24503627, 24474498, 25001883, 25001883, 23932037, 23932037, 23774101, 23104259, 23104259, 25521356, 25521356, 24602049, 25769357, 25858232, 14676821, 17955230, 18690342, 22571356, 19297219, 23602689, 21148049, 25244877, 15714076, 23348161, 18570163, 26777610, 27488176, 27581200, 28033245, 28033245, 28689179] 
 pgxCategory                    | toxicity                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
 phenotypeFromSourceId          | MP_0001914                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
 phenotypeText                  | Hemorrhage                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
 studyId                        | 1447672988                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
 targetFromSourceId             | ENSG00000138109                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
 variantFunctionalConsequenceId | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
 variantRsId                    | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

Unless there are any comments, I consider this work done. Thanks again @apriltuesday for your effort!