mskcc / ACCESS-Pipeline

cfDNA Sequencing Pipeline with UMI
MIT License
10 stars 3 forks source link

TERT promoter mutations not able to be tagged through tag_hotspots #236

Open ionox0 opened 4 years ago

ionox0 commented 4 years ago

Ronak found that for the TERT promoter mutations that were added to the hotspots list, the hotspot_whitelist column does not get updated because the string "Promoter_1295250" is not a valid HGVSp annotation, and so it doesn't pass a regex for "p.\D+\d+" in the code.

We would like to change the HGVSp_Short for the two TERT mutations in the hotspots list from:

Promoter_1295250
Promoter_1295228

To:

p.0    rs1561215364
p.0    rs1242535815,CA557858711

As p.0 is the suggested annotation for non-coding promoter mutations as designated by the HGVS (https://www.hgvs.org/mutnomen/recs-prot.html - see "changes which affect the promoter of a gene")

In addition to changing this file, the actual fix for the bug would be to remove the aa_pos check from this section of tag_hotspots, and tag all hotspots from the file base on "chr", "position", "ref", and "alt", regardless of whether they have a valid HGVSp annotation.

Change:

aa_pos = re.match( r'^p\.\D+(\d+)', row['HGVSp_Short'])
if aa_pos:
    hotspot[key] = aa_pos.group(1)

To this (make the hotspot variable a set instead of a dictionary):

hotspot.add(tuple(key))

The first two lines prevent us from tagging hotspots that don't have a protein-coding annotation (because "p.0" still would not be a match), and they don't seem to serve any other purpose. But if there's some reason we need to validate the HGVSp with a regex please let me know before I make this change.

ionox0 commented 4 years ago

@andurill @rhshah @maysunh