suhrig / arriba

Fast and accurate gene fusion detection from RNA-Seq data
Other
226 stars 50 forks source link

Using Draw.fusion.r with Input from others callers ? #43

Closed Ephedria closed 3 years ago

Ephedria commented 4 years ago

Hello, First, thanks for this tool, it's really quick and useful.

I'm working on a diagnostic call on fusions. I'm been asked to be the more sensitive even if I get false positive. In this aim, I use different caller ( StarFusion & FusionCatcher) . I was wondering if it was possible to use the tool Draw.fusion.r with other inputs ? I know it won't be the same quality since there is a lot of information that you add in your fusion call that others don't. Or maybe the other way around is to add a list of fusions already called little bit like the "-k" argument.

Best,

suhrig commented 4 years ago

Hi @Ephedria,

Another user also wanted to visualize STAR-Fusion results using Arriba's draw_fusions.R. He wrote a conversion script, see issue #26. I have not had a chance to test his script extensively, yet, but you might want to use it as a starting point. If this is such a popular request, I will make the draw_fusions.R script compatible with STAR-Fusion in the future. However, FusionCatcher cannot be used as input, because it lacks the strand information, if I remember correctly. It does specify some strand in the coordinate, but I recall there being some issue with that.

Or maybe the other way around is to add a list of fusions already called little bit like the "-k" argument.

I am obviously biased, but I am almost certain that this approach has higher sensitivity than STAR-Fusion+FusionCatcher combined. Arriba is only inferior when it comes to detecting fusions that are supported by multi-mapping reads (e.g., CIC-DUX4). I'm working on improving this. In all other situations the results of STAR-Fusion are usually a subset of FusionCatcher, which in turn is a subset of Arriba with -k turned on. I encourage you to run your own benchmark and convince yourself. Lastly, if you want maximum sensitivity, you could even fish for discarded fusions in Arriba's discarded fusions file (parameter -O).

Regards, Sebastian

Ephedria commented 4 years ago

Hi Sebastian,

Thanks for your quick response, I will try this script, hopefully it will work. I've done some benchmarking of a known dataset, and indeed Arriba was able to find every single fusion while StarFusion and FusionCatcher was missing some. But the main reason why I don't want to drop StarFusion is mainly because of FusionInspector which is a supplementary module offered by StarFusion that allow the consolidation of the fusion.

My goal was to merge the Output of Arriba / StarFusion / FusionCatcher into a input for FusionInspector and then using the output of FusionInspector (which is the same than StarFusion) to draw the fusions.

About the sensitivity, I'm convinced that we have all the fusions needed, but it's really hard to convince my coworker that just a single caller can call all the fusions, this is why i need to add some caller.

Best, Mario

DarioS commented 4 years ago

I dislike people who use unions of variant callers. They tend not to understand the intricacies of each algorithm and why some variant is called by one software and not the other. They also tend to run algorithms on default settings without thinking about parameter values much. They also make the false equivalence that more results mean better results.

It seems to be possible to create beautiful graphics with arriba. I have used FusionInspector and the quality of the igv.js plot isn't impressive.

FusionCatcher was never published in a peer-reviewed journal. There's only a bioRxiv pre-print from 2014 available describing it. I wouldn't want to be relying on a half-finished product to analyse my clinical data.

Ephedria commented 4 years ago

I do appreciate your opinion and I can understand a part of it, especially when you see metacaller (like NFcore/RNAfusion), but I use different callers exactly because of their specificity. FusionCatcher is way more sensitive on difficult fusions including gene like IGH. I did contact the creator or StarFusion about it and he did confirm the fact that FusionCatcher is more sensitive for the IGH fusions.

While Fusion catcher is not finished, some papers using it are, like this one https://www.nature.com/articles/s41467-019-09374-9 which we did use as a start for our pipeline.

We don't trust blindly in silico analysis for our clinical data, interesting fusions are most of the time confirmed by additional analysis like especially RT-PCR method or sanger sequencing.

About the graphics, this is my point. I wanted to use FusionInspector as a input for arriba.

suhrig commented 4 years ago

FusionCatcher is way more sensitive on difficult fusions including gene like IGH.

I am curious from what benchmarking data you drew this conclusion. FusionCatcher is more sensitive than STAR-Fusion, but according to my tests Arriba is dramatically better than FusionCatcher, especially for IG rearrangements. I ran all tools on the TCGA-DLBC cohort and Arriba detected more than twice as many IG-BCL2/BCL6/MYC rearrangements than FusionCatcher. When you add these fusion partners to a list of known fusions (Arriba parameter -k), then Arriba detected even three times as many fusions as FusionCatcher. My tests may be a bit dated, though (Arriba version 1.0.0 and FusionCatcher version 1.00). Maybe things are different with a more recent version of FusionCatcher. Anyhow, if you know of a IG-rearranged dataset where Arriba performs badly, I would be very grateful if you could point me to it, so I could improve the filters. Just recently, I made an enhancement to one of Arriba's filters which was particularly prone to discarding IG rearrangements (the code is not yet public), so things should even improve more in favor of Arriba in the future. Until then, you might want to disable the end_to_end filter of Arriba, if you are particularly interested in IG* rearrangements. On most samples, it hardly hurts specificity when this filter is disabled.

Ephedria commented 4 years ago

Sorry for the delay.

My words was confusing, sorry, I was talking about FusionCatcher and StarFusion, Arriba is indeed really sensitive with IGH rearrangements. Strangely enough, I have a case where Arriba find a CRLF2-IGH fusion but not the reciprocal fusions IGH-CRLF2 (it's found in the discarded) . While StarFusion finding the IGH-CRLF2 fusion only and FusionCatcher finding both. I can send you the reports of the callers if you like.

I'm thinking about using the -k parameters with inputs of StarFusion and FusionCatcher in order to improve the detection, I didnt try it yet. It would have helped me to rescue the IGH-CRLF2 in the latter case.

I'm working of differents types of cancer, so It will be really detrimental to use the end_to_end filter of Arriba especially if it hurts the other samples.

Best,

suhrig commented 4 years ago

Strangely enough, I have a case where Arriba find a CRLF2-IGH fusion but not the reciprocal fusions IGH-CRLF2

With IG* fusions it's not really clear what is the 5' end of the fusion and what is the 3' end. After all, these translocations often do not produce chimeric proteins, since they exert an effect by means of enhancer hijacking rather than by fusing one protein-coding gene to another. The breakpoints of such enhancer hijacking rearrangements are often intergenic and then there is no well-defined 5' gene and 3' gene. The designation is a bit arbitrary and it could very well be that Arriba chooses one representation for one pair of breakpoints and another for other breakpoints.

I can send you the reports of the callers if you like.

If anything, I would need the raw reads, but in this case it's not necessary. As long as Arriba reports one of them, I am satisfied. But thanks for the offer!

I'm thinking about using the -k parameters with inputs of StarFusion and FusionCatcher

You could do this, but this is not how the -k parameter is supposed to be used. It would be better to provide Arriba with a static list of known fusions, e.g., from Cancer Gene Census. If you only pass the results of STAR-Fusion or FusionCatcher to Arriba, then Arriba's sensitivity will only improve if one of the other tools has already found the fusion. Since the other tools are less sensitive than Arriba, you will only gain something in the rather unlikely event that STAR-Fusion or FusionCatcher report a fusion that Arriba otherwise would have missed.

I'm working of differents types of cancer, so It will be really detrimental to use the end_to_end filter of Arriba especially if it hurts the other samples.

To be clear: in most cases, it does not hurt accuracy, if you disable the end_to_end filter. The results are often identical - whether the filter is enabled or not. I have only seen it make a difference on some samples that have one or the other quality issue. It is a purely technical issue and is not influenced by the type of cancer that you analyze (in my experience). Or you just wait for the next release where the filter has been tweaked that it won't discard IG* rearrangements anymore (or at least much more rarely than previously).

Regards, Sebastian

suhrig commented 4 years ago

I just uploaded an enhanced version of draw_fusions.R to the development branch of the repository that should also be compatible with STAR-Fusion output (both with and without FusionInspector extra columns). You can download it from here:

https://raw.githubusercontent.com/suhrig/arriba/develop/draw_fusions.R

Please give it a try and let me know if it worked.

suhrig commented 3 years ago

Hi Ephedria, the new version of Arriba is out, which natively supports drawing fusions based on output from STAR-Fusion. So I'm closing this issue as resolved.