suhrig / arriba

Fast and accurate gene fusion detection from RNA-Seq data
Other
214 stars 50 forks source link

The CBFB::MYH11D fusion gene is discarded by the 'blacklist' filter, but it is not found in the blacklist file. #215

Closed bioPG closed 9 months ago

bioPG commented 9 months ago

Problem Description: I am using the Arriba software for fusion gene detection, and I have encountered an issue where a genuine fusion gene was discarded by the blacklist filter. However, upon checking the blacklist file, I couldn't find this particular fusion gene, which is causing confusion.

Steps to Reproduce:

  1. Run Arriba software for fusion gene detection.
  2. Observe that a genuine fusion gene is discarded during the analysis.
  3. Examine the blacklist file to check if the fusion gene is listed, but it's not found in the blacklist.

Actual Behavior: A genuine fusion gene is being discarded, even though it's not present in the blacklist file.

Please help investigate this issue and let me know why the blacklist filter is incorrectly discarding this fusion gene when it's not present in the blacklist. Thank you!

image
suhrig commented 9 months ago

Thank you for the detailed report.

The blacklist operates mostly on coordinates, not gene names. If you grep the blacklist for the breakpoints, you will find the following hit (among others):

16:15814908     filter_spliced

This entry is probably responsible for discarding the event. It means when one of the fusion breakpoints is 16:15814908 and the breakpoint is at a splice site (which it is) and the fusion has poor support (which it has; given only 2 supporting reads despite a coverage of >5000), then the fusion is discarded by the blacklist filter.

The reasoning behind this rule is that some genes tend to generate splice variants which are all over the place. When you analyze a cohort, you will find this breakpoint to be involved in many fusion candidates and in many samples, which may potentially indicate that the breakpoint attracts artifacts and should generally be filtered more stringently.

Given the poor read support, I tend to agree with Arriba's judgement. In any case, even if Arriba is wrong here, then the fusion is very subclonal and near the detection limit, in which case it would be unfortunate, but understandable why Arriba made a mistake.

What disease is this sample from? Is it AML? The fusion has been observed predominantly in AML. If it is not leukemia, then I would agree with Arriba's judgement. Then again, you saved this in a folder named rnafusion_valid_batch, which sounds like the fusion has been validated. Is this true? If yes, how was it validated?

bioPG commented 9 months ago

Thank you very much for your detailed and comprehensive response.

I am currently working at a clinical medical testing company in China, responsible for taking over the redevelopment of a fusion gene detection project based on RNA-seq. By comparing the results of different detection software, we believe that Arriba is the most suitable option for clinical testing, whether in terms of sensitivity, specificity, or runtime. I would like to express my sincere gratitude to your team for developing such an excellent open-source fusion gene detection software, which brings great benefits to the precise diagnosis and treatment of cancer patients in China.

In the rnafusion_valid_batch, there are a total of 44 AML-positive samples and 46 fusion genes, all of which have been validated through RT-PCR. Among these 44 samples, only RNA026 and RNA027 truly contain CBFB-MYH11. Because these two samples have sufficient supporting reads, Arriba can easily detect them. Additionally, three other samples (RNA029, RNA035, RNA074) also detected CBFB-MYH11, but they had very few supporting reads, so they were all filtered out by the blacklist. These results further validate the accuracy of Arriba.

Previously, my understanding of the known gene list and blacklist was that these two filters were mutually exclusive. However, through your explanation, I now understand that their relationship is not that simple. Once again, thank you for clarifying this.

suhrig commented 9 months ago

Thank you for the positive feedback! It's always good to hear the effort I put into the tool was worth it.

The blacklist and known fusions list are not mutually exclusive. The former has priority over the latter.