suhrig / arriba

Fast and accurate gene fusion detection from RNA-Seq data
Other
226 stars 49 forks source link

Issue with draw_fusions.R #118

Closed SusanneLipp closed 3 years ago

SusanneLipp commented 3 years ago

Hi,

I tried your tool, works quite nice. I think there is a little problem with the draw_fusions.R script. In lines 194-196 you first assign geneID and than geneName in the exons data.frame and in lines 225-226 you do vice versa. In case of intergenic fusions, the rbind command does not work.

Regards, Susanne

suhrig commented 3 years ago

Hi Susanne,

Thanks for taking the time to report the issue and for the preliminary debugging. Intergenic breakpoints should work in principle. But I recently came across issues with some edge cases myself, which is why I redesigned the assignment of gene names/IDs to intergenic breakpoints. Could give the new version a try and tell if it works with this one? If it doesn't, would you kindly send me an output line of Arriba that causes the error you describe?

Regards, Sebastian

suhrig commented 3 years ago

Did you have a chance to give this a try yet?

aennecken commented 3 years ago

Hi, I have used Arriba in the past and think it is a great tool. I have a question regarding the split reads. In Arriba v1.2.0 the split reads were shown for each gene separately in the visualization. In Arriba v2.1.0 the reads from each gene (split_reads1 and split_reads2) are summed up and the sum is depicted in the visualization. Is there a specific reason for summing up the reads? And could I still use the old version of draw_fusions.R (v1.2.0) if I prefer the split reads separated? Thanks a lot for your reply, Anne

suhrig commented 3 years ago

The reason is that the new script is compatible with output from STAR-Fusion. STAR-Fusion does not report split reads separately for the two breakpoints. As I wanted to keep things consistent, I had to merge them.

You are not the first one to ask this. I guess it would be better if the script made a distinction between Arriba and STAR-Fusion, and reported separate split reads for Arriba and merged split reads for STAR-Fusion. I will implement this soon and notify you.

You cannot use draw_fusions.R from v1.2.0 with output from Arriba 2. They are incompatible.

aennecken commented 3 years ago

Okay, thanks a lot for the clarification.

suhrig commented 3 years ago

Hi Anne,

I have modified the script to display the split reads separately as before (if you supply output from Arriba). You can download the modified version from here:

https://github.com/suhrig/arriba/blob/develop/draw_fusions.R

There is one difference to the old behavior: Instead of labeling as gene names, the numbers are now labeled as "split reads at breakpoint1/2". I hope that is okay. I chose generic labels over gene names, since the gene names can be quite lengthy and protrude beyond the printable area, especially when the breakpoints are intergenic. They can also be empty in case of viral integration, which would be uninformative.

Regards, Sebastian

aennecken commented 3 years ago

Dear Sebastian, Thanks a lot for providing the modified script. I will give it a try right away :) Kind regards, Anne