ndaniel / fusioncatcher

Finder of Somatic Fusion Genes in RNA-seq data
GNU General Public License v3.0
142 stars 67 forks source link

Correct fusion reported but false positions for MYC gene and IGH@ locus #198

Open pkerbs opened 2 years ago

pkerbs commented 2 years ago

Hi Daniel, i have noticed the same issue in version 1.33 for IGH@ locus and MYC. Could you please have a look at it? Thank you.

IGH@

chr14:105,556,000-106,883,700 (hg38 location) FusionCatcher reports -> chr14:63187193

BCL2    IGH@    known,oncogene,chimer2,cancer,tumor,oncokb,mitelman,ccle    0   34  19  104 BOWTIE+SPOTLIGHT    18:63126237:-   14:63187193:-   ENSG00000171791 ENSG09000001014         TCAAATCTATGGTGGTTTGACCTTTAGAGAGTTGCTTTACGTGGCCTGTTTCAACACAGACCCACCCAGAGCCCTCCTGCCCTCCTTCCGCGGGGGCTTTCTCATGGCTGTCCTTCAGGGTCTTCCTGAAATGCAGTGGTGCTTACGCTCCACCAAGAAAGCAGGAAACCT*CATAACGACTTTACTACTACTACTACATGGACGTCTGGGGCAAAGGGACCACGGTCACCGTCTCCTCAGCC UTR/---

MYC

chr8:127735434-127742951 (hg38 location) FusionCatcher reports -> chr8:105556959

IGH@    MYC known,oncogene,chimer2,cancer,tumor,m342,oncokb,mitelman,ccle,t23   0   430 191 133 BOWTIE+SPOTLIGHT    14:105647945:+  8:105556959:+   ENSG09000000013 ENSG00000136997         TGCTGTCCTTGGTCCTGGCTGAGAGAGGGCCCCACGGCCAGCACTGCTGACCCTGCCCTGGGCTCCAGTGATGCTGCTGGCCTGGACAAGCCCCTCCGTTCACCTGGGGCCTCTCCTCCTCCCTCGTTCTACTGCCT*CGCCCCTCCCGGGTTCCCAAAGCAGAGGGCGTGGGGGAAAAGAAAAAAGATCCTCTCTCGCTAATCTCCGCCCACCGGCCCTTTAT    ---/intergenic

Originally posted by @pkerbs in https://github.com/ndaniel/fusioncatcher/issues/169#issuecomment-1167353883

pkerbs commented 2 years ago

Hey Daniel, this bug seems to be not only restricted to certain genes. Reported fusion events that have "SPOTLIGHT" in the "Fusion_finding_method" column all show wrong positions for the 3' prime partner gene. Moreover, the reported position number (ignoring the chromosome) of the 3' prime gene is close to the reported position number of the 5' gene (ignoring the chromosome). Here are some examples from samples of the LL-100 panel:

With SPOTLIGHT:

sample  gene1   gene2   Fusion_description  Counts_of_common_mapping_reads  Spanning_pairs  Spanning_unique_reads   Longest_anchor_found    Fusion_finding_method   break5prime break3prime gene_id1    gene_id2    Exon_1_id(5end_fusion_partner)  Exon_2_id(3end_fusion_partner)  Fusion_sequence Predicted_effect
697 TCF3    PBX1    known,oncogene,cosmic,chimer2,cgp,ticdb,chimer4kb,chimer4pub,chimer4seq,cancer,tumor,oncokb,mitelman,ccle,t411  24  2342    641 135 BOWTIE+SPOTLIGHT    chr19:1619112:- chr1:1465091:+  ENSG00000071564 ENSG00000185630         TGCACAACCACGCGGCCCTCCCCAGCCAGCCAGGCACCCTCCCTGACCTGTCTCGGCCTCCCGACTCCTACAGT*GTTTTGAGTATCCGAGGAGCCCAGGAGGAGGAACCCACAGACCCCCAGCTGATGCGGCTGGACAACATGCTGTTAGCGGAAGGCGTGGCGGGGCCTGAGAA    CDS(truncated)/intergenic
DND-41  TAF15   TCF3    known,oncogene,ambiguous,fragments,cancer,tumor,t1  165 5   4   114 BOWTIE+SPOTLIGHT    chr17:35820075:+    chr19:35874574:-    ENSG00000270647 ENSG00000071564         GGGGGTGAGCAGCAAAGTTATTCTACCTATGGAAATCCAGGCAGCCAAGGCTATGGACAAGCATCACAA*GCAAGAGCGGTGAGCGGGGCGCCTATGCCTCCTTCGGGAGAGACGCAGGCGTGGGCGGCCTGACTCAGGCTGGCTTCCTGTCAGGCGAGCTGGCCCTCAACAGCCCCGGGCCCC    CDS(truncated)/intergenic
REC-1   IGKV3-11    CCND1   known,oncogene,chimer2,fragments,cancer,tumor,m0,multi,mitelman 0   3   2   118 BOWTIE+SPOTLIGHT    chr2:89027386:- chr11:89020700:+    ENSG00000241351 ENSG00000110092         GGAAGCCCCAGCTCAGCTTCTCTTCCTCCTGCTACTCTGGCTCCCAGATACCACCGGAGAAATTGTGTTGACACAGTCTCCAGCCACCCTGTCTTTGTCTCCAGGGGAAAGAGCCACCCTCTCCTGCA*ATGACGGCCGAGAAGCTGTGCATCTACACCGACAACTCCATCCGGCCCGAGGAGCTGCTGCAAATGGAG  CDS(truncated)/intergenic
RI-1    MBD3    TCF3    known,oncogene,fragments,cancer,tumor,ccle,t1,10K<gap<100K,short_repeats,sr1.92 16  5   4   108 BOWTIE+SPOTLIGHT    chr19:1592620:- chr19:1541154:- ENSG00000071655 ENSG00000071564         CCGGGCGGCGGCGGCGGGCCCGGCGGCGGGCCGAGGAGCCGGGCGCAATGGAGCGGAAG*AGGGTTTCCAGGCCTGAGGTGCCCGCCCTGGCCCCAGGAGAATGAACCAGCCGCAGAGGATGGCGCCTGTGGGCACAGACAAGGAGCTCAGTGACCTCCTGGACTTCAGCATGATGTTCCCGCTGCCTGTCACCAA    CDS(truncated)/intergenic
SEM PAN3    FLT3    known,oncogene,tcga,chimer4seq,cancer,tumor,tcga-cancer,tcga2,mitelman,ccle,tcga3,reciprocal    0   16  3   103 BOWTIE+SPOTLIGHT    chr13:28261455:+    chr13:28210807:-    ENSG00000152520 ENSG00000122025         AGCTGATCAACAGACATTTAATAACAATGGCTCAAATTGATCAAGCAGATATGC*CAGAAAAAGCAGACAGCTCTGAAAGAGAGGCACTCATGTCAGAACTCAAGATGATGACCCAGCTGGGAAGCCACGAGAATATTGTGAACCTGCTGGGGGCGTGCACACTGTCAG   CDS(truncated)/intergenic

Without Spotlight:

sample  gene1   gene2   Fusion_description  Counts_of_common_mapping_reads  Spanning_pairs  Spanning_unique_reads   Longest_anchor_found    Fusion_finding_method   break5prime break3prime gene_id1    gene_id2    Exon_1_id(5end_fusion_partner)  Exon_2_id(3end_fusion_partner)  Fusion_sequence Predicted_effect
697 RP1-29K1.7  IGH@    lncrna,m11  0   19  4   34  BOWTIE+STAR chr6:28491647:+ chr14:106591638:-   ENSG00000286819 ENSG09000001017         TGCTTCAGTGGTCACACTCCTAGTCCGCCTTCATGTTCCATCCTGTACAC*CTGGCTCTGCCTTCTAGATAGCAGTAGCAAATCAGTGAAAGTACTAACAG   exonic(no-known-CDS)/---
BC-3    IGH@    NBEA    t8,reciprocal   0   11  3   28  BOWTIE+BLAT chr14:106453527:-   chr13:35110810:+    ENSG09000001017 ENSG00000172915         CCTGTCTCCTGGCTTCACTGCCTCAGCCTCCCGAGTAGCTGGGATTACAG*GTTCAGCTTTCCCTATACACATATTTGTCTGCTGAATTTATTGGAACTGC   ---/CDS(truncated)
BC-3    IGH@    NBEA    t8,reciprocal   0   11  2   23  BOWTIE+STAR chr14:106589130:-   chr13:35110808:+    ENSG09000001017 ENSG00000172915         CCGGGTTCAAGCGATTCTCCTGCCTCGGCCTCCCGAGTAGCTGGGATTAC*AGGTTCAGCTTTCCCTATACACATATTTGTCTGCTGAATTTATTGGAACT   ---/intronic
BC-3    NBEA    IGH@    t8,reciprocal   0   11  14  38  BOWTIE+STAR chr13:35110978:+    chr14:106533871:-   ENSG00000172915 ENSG09000001017         CTGGGTTATTAATCCTGCTGACAGTAGTGGCATTACACCTAAAGGATTAG*CCAAAGATTCCTGAAGACAGAGCTGATGTGACGTACTCATAGGTGGATCT   CDS(truncated)/---
BC-3    NBEAP1  IGH@    known,pseudogene,chimer2,fragments,cancer,m0,multi  0   3   2   34  BOWTIE+STAR chr15:20656183:-    chr14:106429433:+   ENSG00000258590 ENSG09000000016         GTTTCTGAAGATGAGAGGCCCACCTTAAATTTCAAAAATAGATCATTTCT*AAAACAAATAACCCCATCAAAAAGTGGGCAAAGGACATGAACAGACACTT   intergenic/---
BCBL-1  IGH@    CLEC2D  banned,known,hpa,m12,t4 0   13  9   41  BOWTIE+STAR chr14:106438593:+   chr12:9697544:+ ENSG09000000016 ENSG00000069493         CTCTGAAGGCTGTGAGACCCCTGATTTCCCACTTCACACCTCTATATTTC*TGTGTGTGTCTTTAGTTCCTCTGGCGCTGCTGGGTTAGGATCTACCCGAC   ---/UTR

So I think there is just a bug in the context of SPOTLIGHT reported fusion events and only the positions of the 3' partners are reported wrongly. Would be nice if you could fix this :) Thank you in advance.

Best wishes, Paul