suhrig / arriba

Fast and accurate gene fusion detection from RNA-Seq data
Other
214 stars 50 forks source link

Issue with the "relative_support" filter #214

Closed bioPG closed 9 months ago

bioPG commented 10 months ago

Can the sentence inside the red box be understood as follows: For a fusion event, such as BCR-ABL1, it may have multiple breakpoints corresponding to multiple events, and the number of events corresponding to different supporting reads is polynomially related to the event itself.

image
suhrig commented 9 months ago

Forgot to reply to this, sorry.

It means this: Arriba counts the number of fusion candidates (events) involving a given gene. These could be true fusions or artifacts. They need not even involve the same pair of genes (BCR-ABL1) - Arriba counts all events affecting a given gene. This is to estimate the level of background noise. When a gene has many events, Arriba applies more stringent filtering to compensate for the increased level of background noise. Most of them will be artifacts anyway. Highly expressed genes or hard-to-align regions would be two examples for artifact-attracting regions giving rise to many events. By "more stringent filtering" I mean Arriba requires events to have more supporting reads. This is the purpose of the relative_support filter: It passes only those events which have a sizable number of supporting reads relative to the level of background noise/total number of events. The relationship between the number of events and the minimum required number of supporting reads is modeled as a polynomial function.

I hope this explanation is clearer.