suhrig / arriba

Fast and accurate gene fusion detection from RNA-Seq data
Other
214 stars 50 forks source link

Output fasta for TPM counting #242

Closed ericmalekos closed 2 weeks ago

ericmalekos commented 2 weeks ago

Thank you for the great tool.

I am wondering if there is anyway to generate a Fasta based on the gene fusion output. I would like to quantify TPMs with Salmon/Kallisto which require a Fasta version of the transcriptome. Context: my idea is to prepend the gene fusion Fasta to a gencode transcript fasta and thereby quantify expression. One challenge I anticipate is that, depending on the fusion genes, there may be many isoform combinations to consider.

suhrig commented 2 weeks ago

If you run Arriba with the parameter -I, it gives you the full fusion transcript in the fusion_transcript column (with some exceptions where this is not possible). After removing special characters (i.e., anything other than A C T G a c t g) you can convert this to a FastA file, which should be suitable for quantification.

Note that Arriba only reports the sequence for one transcript per gene. It can't give you all the possible combinations of transcripts. It picks the one which best matches the splice pattern of the supporting reads.

ericmalekos commented 2 weeks ago

That's great thank you!