suhrig / arriba

Fast and accurate gene fusion detection from RNA-Seq data
Other
214 stars 50 forks source link

Using a genome not supported #221

Closed gabi-ryan closed 4 months ago

gabi-ryan commented 8 months ago

Hello,

I'm looking at detecting fusions in the T2T genome. I tried running it using the T2T reference files, without any blacklist or known fusion files. It ran but I didn't get any results. I am wondering whether it is possible to run Arriba with the T2T genome, and if so, what configuration I would need to do to run it. (And if not - whether you have any plans to add T2T to your supported genomes)

suhrig commented 8 months ago

The fact that you're using an unsupported genome should not prevent you from calling fusions. In fact, there should be more fusion calls, because the blacklist does not work properly and fails to remove common false positives.

Can you paste the status updates that Arriba prints on one of the samples? It should reveal which step is responsible for discarding the gross if the candidates.

gabi-ryan commented 8 months ago

Thanks for the reply, suhrig. This is the part where it drops the candidates.

[2023-10-25T15:32:20] Filtering mates which do not map to interesting contigs (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y AC* NC) (remaining=1417417) [2023-10-25T15:32:20] Filtering mates which only map to viral contigs (AC_ NC_*) (remaining=0)

So could this be caused by the annotation for the chromosomes in T2T?

suhrig commented 8 months ago

That explains it. Arriba thinks all of the chromosomes are viral ones, because they match the pattern AC_* or NC_*. Try adding the parameter -v "". Also, you will need to pass a comma-separated list of the main chromosome names to the parameter -i.