suhrig / arriba

Fast and accurate gene fusion detection from RNA-Seq data
Other
226 stars 49 forks source link

Gene2 entry (in `fusions.tsv`) not following the format described in the manual #143

Closed PubuduSaneth closed 2 years ago

PubuduSaneth commented 2 years ago

Hi

Thank you very much for developing and maintaining the Arriba tool.

Recently, I noticed a record in "fusions.tsv" output file that is not described in the Arriba manual. I observed a gene2 entry in the fusions.tsv output listing a gene-name followed by a number enclosed in parenthesis. For example, gene2 entry is AC113430.1(19952).

According to the manual,

if the breakpoint is in an intergenic region, Arriba lists the closest genes upstream and downstream from the breakpoint, separated by a comma. The numbers in parentheses after the closest genes state the distance to the genes.

In the case I'm highlighting here, I only observed one gene followed by a number enclosed in parenthesis (as mentioned above). In other words, I don't see the closest upstream and downstream genes. So I can't interpret this entry and wonder whether you have some suggestions for me to check further.

Thanks a lot in advance and let me know if you need more info.

suhrig commented 2 years ago

Hi Pubudu,

Thanks for reading the manual before asking a question! Your observation can happen when the breakpoint is close to the chromosome start/end, such that there is only one - an upstream gene or a downstream gene - but not the other. BTW, the column can even be completely empty (.), if the breakpoint is on a chromosome without any annotation, and it can even happen that more than two genes are annotated, namely when there is more than one gene with the exact same distance in one direction. Admittedly, the manual glosses over some edge cases in order not to be too verbose, but generally it is still valid.

Can you check whether AC113430.1 is close to the chromosome start/end?

Regards, Sebastian

PubuduSaneth commented 2 years ago

Thank you very much Sebastian for the feedback, yes AC113430.1 was at the chromosome start.

Best Pubudu