yhoogstrate / dr-disco

:mask::loudspeaker: Dr. Disco: detecting genomic breakpoints of fusion transcripts in random hexamer RNA-seq data
GNU General Public License v3.0
1 stars 0 forks source link

add chim overhang balance to output #89

Closed yhoogstrate closed 7 years ago

yhoogstrate commented 7 years ago

Many of the false positives which are 'alignment artefacts' are aligned very asymetrically:

25SM101 <-> 101S19M6S

Maybe ideal to add minimal/mean/median/max M value per node, and the maximum per all nodes (for spliced):

arc1 (node-A - node-K):
25SM101 <-> 101S19M6S
25SM101 <-> 101S19M6S

arc2 (node-A - node-L):
50SM76 <-> 76S50M
50SM76 <-> 76S50M
50SM76 <-> 76S50M

arc1 -> (101, 19) << biggest imbalance
arc2 -> (50, 76)

return smallest imbalance (50, 76), likely okay

When this will be incorporated into classification keep the following covariates in mind: