oncokb / oncokb-annotator

Annotates variants in MAF with OncoKB annotation.
GNU Affero General Public License v3.0
122 stars 61 forks source link

Annotator falsely returns Likely Oncogenic for a splice region variant without HGVSp info #192

Closed ibrahimkurt closed 1 year ago

ibrahimkurt commented 1 year ago

Hi,

We use vcf2maf w/ -inhibit-vep. Our maf file has splice region variants (naturally) without any HGVSp or HGVSp_Short info. When we call the mafannotator.py script with -q HGVSp or -q HGVSp_Short; the variants are falsely returned as Likely Oncogenic, although the variants do not exist in the OncoKB database. It even prints out a PubMed reference claimed to be related to the variant (https://pubmed.ncbi.nlm.nih.gov/21896780/). But it is not. The script works correctly though if we use -q Genomic_Change flag. We believe there could be a serious bug here for such variants.

The line of code:

${mafannotator} -i ${maffile} -o ${outputfile} -t NSCLC -q Genomic_Change -r GRCh38 -b ${token}

And here is the MAF file (changed to .txt to be able to upload to GitHub) showing only one of the problematic variants in it:

sample.txt

Intersting thing is, if you try URL/api manually like the following for the variant, it works correctly:

curl -X GET "https://www.oncokb.org/api/v1/annotate/mutations/byGenomicChange?genomicLocation=17,12107914,12107914,G,T&tumorType=NSCLC&referenceGenome=GRCh38" -H "accept: application/json" -H "Authorization: Bearer ***"

To understand where the bug might be, we added a print function to get the actual url submitted to OncoKB by the AnnotatorCore.pyscript. But for some reason it never prints out anything for any one of the -q parameter.

def pull_genomic_change_info(queries, annotate_hotspot):
    url = oncokb_annotation_api_url + '/annotate/mutations/byGenomicChange'
    response = makeoncokbpostrequest(url, queries)
    if response.status_code == 401:
        raise Exception('unauthorized')
    annotation = []
    if response.status_code == 200:
        annotation = response.json()
    else:
        for query in queries:
            geturl = url + '?'
            geturl += 'genomicLocation=' + query.genomicLocation
            geturl += '&tumorType=' + query.tumorType
            print(geturl)
            getresponse = makeoncokbgetrequest(geturl)
            if getresponse.status_code == 200:
                annotation.append(getresponse.json())
            else:
                # if the api call fails, we should still push a None into the list
                # to keep the same length of the queries
                print('Error on annotating the url ' + geturl)
                annotation.append(None)

    processed_annotation = []
    for query_annotation in annotation:
        processed_annotation.append(process_oncokb_annotation(query_annotation, annotate_hotspot))
    return processed_annotation
ibrahimkurt commented 1 year ago

A related but different serious problem:

A splice-site in intron, a LOW effect variant: PTCH1 c.3306+5G>T

It is returned as Likely Oncogenic along with FDA-approved drug recommendation with even -q Genomic_Change flag.

Interestingly, even manual URL/API as shown below returns the same wrong/false information:

curl -X GET "https://www.oncokb.org/api/v1/annotate/mutations/byGenomicChange?genomicLocation=9,95456271,95456271,C,A&tumorType=BREAST&referenceGenome=GRCh38" -H "accept: application/json" -H "Authorization: Bearer ***"

The really problematic thing is, the OncoKB webpage shows the correct information as unknown: https://www.oncokb.org/gene/PTCH1/c.3306+5G%3ET

We believe this shows that both the API and python-based annotator scripts could be problematic?

ibrahimkurt commented 1 year ago

Even a missense mutation with -q Genomic_Change gives the wrong output as Likely Oncogenic and conflicts with the webpage:

BRAF c.1633C>T L545F

https://www.oncokb.org/gene/BRAF/c.1633

zhx828 commented 1 year ago

@ibrahimkurt few points to your questions

I hope these address your concerns.

We take annotation quality seriously. Happy to look into other variants if you think they have been interperted incorrectly.

ibrahimkurt commented 1 year ago

Thanks for the replies. I will in this case close this issue and open another one for HGVSp vs HGVSP_Short inconsistencies.