varfish-org / mehari

VEP-like tool for sequence ontology and HGVS annotation of VCF files
MIT License
14 stars 1 forks source link

Mehari needs better handling of transcripts without stop codon in CDS, e.g., NM_001291324 #383

Open holtgrewe opened 4 months ago

holtgrewe commented 4 months ago

Describe the bug Currently mehari db build will skip the transcript NM_001291324.3 in the GRCh37 transcript alignments (as the alignment has no stop codon in the CDS). However, it mentions it in the built file. We need to handle this more elegantly.

To Reproduce N/A

Expected behavior Such transcripts should not be written to the output file.

Screenshots N/A

Additional context

holtgrewe commented 4 months ago

Curious, all transcripts of HGNC:29279 are affected by this but UCSC shows RefSeq transcript alignments.

holtgrewe commented 4 months ago

OK, this appears to be legit by the data ...

# grep 'NM_001377448|NM_001291324' ~/mehari-data-tx/pass-1/txs.bin.zst.report 
Skipping transcript NM_001291324.3 because of missing stop codon in translated CDS
Skipping transcript NM_001377448.1 because of missing stop codon in translated CDS
holtgrewe commented 4 months ago

There are 139 such transcripts, see below. Most probably, we cannot easily resolve this for GRCh37.

# grep 'because of missing stop codon' ~/mehari-data-tx/pass-1/txs.bin.zst.report | wc -l
139
# grep 'because of missing stop codon' ~/mehari-data-tx/pass-1/txs.bin.zst.report
Skipping transcript NM_000442.5 because of missing stop codon in translated CDS
Skipping transcript NM_001012288.3 because of missing stop codon in translated CDS
Skipping transcript NM_001017915.3 because of missing stop codon in translated CDS
Skipping transcript NM_001039127.6 because of missing stop codon in translated CDS
Skipping transcript NM_001040627.2 because of missing stop codon in translated CDS
Skipping transcript NM_001077693.4 because of missing stop codon in translated CDS
Skipping transcript NM_001085423.2 because of missing stop codon in translated CDS
Skipping transcript NM_001085474.2 because of missing stop codon in translated CDS
Skipping transcript NM_001137667.2 because of missing stop codon in translated CDS
Skipping transcript NM_001137668.2 because of missing stop codon in translated CDS
Skipping transcript NM_001143962.2 because of missing stop codon in translated CDS
Skipping transcript NM_001145026.2 because of missing stop codon in translated CDS
Skipping transcript NM_001145064.3 because of missing stop codon in translated CDS
Skipping transcript NM_001170637.4 because of missing stop codon in translated CDS
Skipping transcript NM_001174092.3 because of missing stop codon in translated CDS
Skipping transcript NM_001201380.3 because of missing stop codon in translated CDS
Skipping transcript NM_001271872.3 because of missing stop codon in translated CDS
Skipping transcript NM_001282302.2 because of missing stop codon in translated CDS
Skipping transcript NM_001289930.2 because of missing stop codon in translated CDS
Skipping transcript NM_001289931.2 because of missing stop codon in translated CDS
Skipping transcript NM_001290033.2 because of missing stop codon in translated CDS
Skipping transcript NM_001290047.2 because of missing stop codon in translated CDS
Skipping transcript NM_001290097.2 because of missing stop codon in translated CDS
Skipping transcript NM_001290098.1 because of missing stop codon in translated CDS
Skipping transcript NM_001291281.3 because of missing stop codon in translated CDS
Skipping transcript NM_001291310.2 because of missing stop codon in translated CDS
Skipping transcript NM_001291316.2 because of missing stop codon in translated CDS
Skipping transcript NM_001291317.2 because of missing stop codon in translated CDS
Skipping transcript NM_001291324.3 because of missing stop codon in translated CDS
Skipping transcript NM_001291345.2 because of missing stop codon in translated CDS
Skipping transcript NM_001291815.2 because of missing stop codon in translated CDS
Skipping transcript NM_001293739.2 because of missing stop codon in translated CDS
Skipping transcript NM_001300952.2 because of missing stop codon in translated CDS
Skipping transcript NM_001302371.3 because of missing stop codon in translated CDS
Skipping transcript NM_001303486.3 because of missing stop codon in translated CDS
Skipping transcript NM_001303489.3 because of missing stop codon in translated CDS
Skipping transcript NM_001304359.2 because of missing stop codon in translated CDS
Skipping transcript NM_001324381.3 because of missing stop codon in translated CDS
Skipping transcript NM_001329984.2 because of missing stop codon in translated CDS
Skipping transcript NM_001349168.2 because of missing stop codon in translated CDS
Skipping transcript NM_001349169.2 because of missing stop codon in translated CDS
Skipping transcript NM_001349170.2 because of missing stop codon in translated CDS
Skipping transcript NM_001349171.2 because of missing stop codon in translated CDS
Skipping transcript NM_001350319.2 because of missing stop codon in translated CDS
Skipping transcript NM_001350451.2 because of missing stop codon in translated CDS
Skipping transcript NM_001350453.2 because of missing stop codon in translated CDS
Skipping transcript NM_001354346.2 because of missing stop codon in translated CDS
Skipping transcript NM_001359228.2 because of missing stop codon in translated CDS
Skipping transcript NM_001359229.2 because of missing stop codon in translated CDS
Skipping transcript NM_001359230.2 because of missing stop codon in translated CDS
Skipping transcript NM_001359231.2 because of missing stop codon in translated CDS
Skipping transcript NM_001365455.2 because of missing stop codon in translated CDS
Skipping transcript NM_001366028.2 because of missing stop codon in translated CDS
Skipping transcript NM_001366280.2 because of missing stop codon in translated CDS
Skipping transcript NM_001369493.1 because of missing stop codon in translated CDS
Skipping transcript NM_001372044.2 because of missing stop codon in translated CDS
Skipping transcript NM_001377444.1 because of missing stop codon in translated CDS
Skipping transcript NM_001377445.1 because of missing stop codon in translated CDS
Skipping transcript NM_001377446.1 because of missing stop codon in translated CDS
Skipping transcript NM_001377447.1 because of missing stop codon in translated CDS
Skipping transcript NM_001377448.1 because of missing stop codon in translated CDS
Skipping transcript NM_001378188.1 because of missing stop codon in translated CDS
Skipping transcript NM_001382323.2 because of missing stop codon in translated CDS
Skipping transcript NM_001382324.1 because of missing stop codon in translated CDS
Skipping transcript NM_001382325.1 because of missing stop codon in translated CDS
Skipping transcript NM_001382326.1 because of missing stop codon in translated CDS
Skipping transcript NM_001382327.1 because of missing stop codon in translated CDS
Skipping transcript NM_001382328.1 because of missing stop codon in translated CDS
Skipping transcript NM_001382329.1 because of missing stop codon in translated CDS
Skipping transcript NM_001382330.1 because of missing stop codon in translated CDS
Skipping transcript NM_001382331.1 because of missing stop codon in translated CDS
Skipping transcript NM_001382332.1 because of missing stop codon in translated CDS
Skipping transcript NM_001382334.1 because of missing stop codon in translated CDS
Skipping transcript NM_001382335.1 because of missing stop codon in translated CDS
Skipping transcript NM_001382336.1 because of missing stop codon in translated CDS
Skipping transcript NM_001382337.1 because of missing stop codon in translated CDS
Skipping transcript NM_001382338.1 because of missing stop codon in translated CDS
Skipping transcript NM_001382339.1 because of missing stop codon in translated CDS
Skipping transcript NM_001382340.1 because of missing stop codon in translated CDS
Skipping transcript NM_001382341.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385227.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385228.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385804.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385805.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385806.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385809.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385813.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385814.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385815.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385816.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385817.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385818.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385819.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385820.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385821.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385822.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385823.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385824.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385825.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385827.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385828.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385830.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385831.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385836.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385839.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385840.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385841.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385842.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385843.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385845.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385846.1 because of missing stop codon in translated CDS
Skipping transcript NM_001385847.1 because of missing stop codon in translated CDS
Skipping transcript NM_001733.7 because of missing stop codon in translated CDS
Skipping transcript NM_001787.3 because of missing stop codon in translated CDS
Skipping transcript NM_003585.5 because of missing stop codon in translated CDS
Skipping transcript NM_003631.5 because of missing stop codon in translated CDS
Skipping transcript NM_003715.4 because of missing stop codon in translated CDS
Skipping transcript NM_003868.3 because of missing stop codon in translated CDS
Skipping transcript NM_005541.5 because of missing stop codon in translated CDS
Skipping transcript NM_005960.2 because of missing stop codon in translated CDS
Skipping transcript NM_012115.4 because of missing stop codon in translated CDS
Skipping transcript NM_012133.6 because of missing stop codon in translated CDS
Skipping transcript NM_012309.5 because of missing stop codon in translated CDS
Skipping transcript NM_014703.3 because of missing stop codon in translated CDS
Skipping transcript NM_015326.5 because of missing stop codon in translated CDS
Skipping transcript NM_018461.5 because of missing stop codon in translated CDS
Skipping transcript NM_018711.5 because of missing stop codon in translated CDS
Skipping transcript NM_022148.4 because of missing stop codon in translated CDS
Skipping transcript NM_031308.4 because of missing stop codon in translated CDS
Skipping transcript NM_031421.5 because of missing stop codon in translated CDS
Skipping transcript NM_032508.4 because of missing stop codon in translated CDS
Skipping transcript NM_033486.3 because of missing stop codon in translated CDS
Skipping transcript NM_033487.3 because of missing stop codon in translated CDS
Skipping transcript NM_033489.3 because of missing stop codon in translated CDS
Skipping transcript NM_033490.3 because of missing stop codon in translated CDS
Skipping transcript NM_053005.5 because of missing stop codon in translated CDS
Skipping transcript NM_138352.3 because of missing stop codon in translated CDS
Skipping transcript NM_173600.2 because of missing stop codon in translated CDS
Skipping transcript NM_178562.5 because of missing stop codon in translated CDS