suhrig / arriba

Fast and accurate gene fusion detection from RNA-Seq data
Other
214 stars 50 forks source link

Reference Genome #220

Closed bshim181 closed 3 months ago

bshim181 commented 8 months ago

Hello,

I am running Arriba and Starfusion as fusion transcript detection algorithm and I had inquiry about reference genome usage. I am previously have experiments(ribosome sequencing and non-canonical ORF detection) analyzed under Gencode GRCH37 V43 GTF annotation. I have run Starfusion and Arriba on the same Genome build reference(GRCH37) but I am aware that genome versions are different between two experiments. I would like to somewhat compare transcript level non canonical ORF predictions and fusion transcript predictions (would this be feasible if genome version are different?). Also on a side note, how much of an effect would genome version have on fusion transcript prediction, would it be possible to build arriba library on a different genome build version? For example, GRCH37 V43 (more recent ones).

suhrig commented 8 months ago

Hi, can you send me links to the various reference files that you mention so I can better understand the differences between them? Thanks!

suhrig commented 3 months ago

Hi, I'm closing this issue due to lack of feedback.

In general, it is possible to use any GTF annotation you wish - as long as its coordinates are compatible with the assemblies supported by Arriba (GRCh37, GRCh38). In my experience, GENCODE annotation works best, since it has the most comprehensive annotation. And using a newer version (V43) should be perfectly fine. So if your Ribo-seq experiment used V43, feel free to pass the same GTF file to Arriba. There will be slight differences in the fusion calls, but they should not be substantial and should mostly affect low-confidence fusions.