suhrig / arriba

Fast and accurate gene fusion detection from RNA-Seq data
Other
226 stars 49 forks source link

Does this software only work on humans? #160

Closed wiltbb closed 2 years ago

wiltbb commented 2 years ago

I'm looking for the wheat fusion genes. Could I use this software?

suhrig commented 2 years ago

The officially supported organisms are human and mouse. That being said, I have heard about users applying the tool to other organisms. As a bare minimum, you need an assembly of the genome (FastA file) and a gene model (GTF file). All else is optional and can simply be omitted when running Arriba (such as the known fusions file, the protein domains file, etc.).

As there is no blacklist for non-supported organisms, you should expect to see quite a few more false positives than usual. Moreover, you need to disable the blacklist when running Arriba using the parameter -f blacklist or else Arriba will refuse to run.

wiltbb commented 2 years ago

Thank you Suhrig,now I tried to run arriba but I have a new problem like this: ERROR: no normal reads found My script like this: arriba \ -x star.out \ -a /public/home/zhukun/wheat_transcription/iwgsc_refseqv2.1_assembly.fa -g /public/home/zhukun/wheat_transcription/iwgsc_refseqv2.1_assembly.gtf -o fusions_demo.tsv \ -f blacklist \ -p /public/home/zhukun/wheat_transcription/iwgsc_refseqv2.1_annotation_200916_HC.gff3

wiltbb commented 2 years ago

star.out is the result of the STAR. Can you help me,thank you very much

wiltbb commented 2 years ago

Does wheat need the -i parameter?

suhrig commented 2 years ago

Does wheat need the -i parameter?

Yes, this is likely the underlying problem. If wheat chromosomes are not named as in human/mouse like chr1, chr2, etc., then you must list the chromosome names explicitly, for example: -i 'wheatchrom1 wheatchrom2 wheatchrom3'

Otherwise Arriba only looks for reads mapped to human/mouse chromosomes and does not find any, hence the error message no normal reads found.

suhrig commented 2 years ago

BTW, you can remove this parameter: -p /public/home/zhukun/wheat_transcription/iwgsc_refseqv2.1_annotation_200916_HC.gff3 This file should contain protein domains. I am pretty sure the file you have doesn't, because for human/mouse I have generated the file personally, and I have not generated such a file for the wheat genome.

DarioS commented 2 years ago

What is the format of STAR.out? Note that it is necessary to change the default --chimOutType of STAR. Please read Input Files.

wiltbb commented 2 years ago

BTW, you can remove this parameter: -p /public/home/zhukun/wheat_transcription/iwgsc_refseqv2.1_annotation_200916_HC.gff3 This file should contain protein domains. I am pretty sure the file you have doesn't, because for human/mouse I have generated the file personally, and I have not generated such a file for the wheat genome.

ok,I''try again,thank you sir

wiltbb commented 2 years ago

1

What is the format of STAR.out? Note that it is necessary to change the default --chimOutType of STAR. Please read Input Files.

STAR.out is a binary file.

wiltbb commented 2 years ago

Thank you very much,I solved this problem,Now I've got a bunch of fusion.tsv files,I don't know what to do next, Can you give me some advice? For example,what fusions analytical software is available?

suhrig commented 2 years ago

There is some information about interpretation of the output files in the manual, although this is tailed towards fusions in cancer. You may also find the description of the file format useful.

Interpretation depends on what you are looking for. I am not sure of how much help I can be here, because I have never worked with wheat and have only used Arriba in the context of cancer so far. What was your motivation to do this analysis? You probably had some ideas/hypotheses about expected findings.

wiltbb commented 2 years ago

There is some information about interpretation of the output files in the manual, although this is tailed towards fusions in cancer. You may also find the description of the file format useful.

Interpretation depends on what you are looking for. I am not sure of how much help I can be here, because I have never worked with wheat and have only used Arriba in the context of cancer so far. What was your motivation to do this analysis? You probably had some ideas/hypotheses about expected findings.

Ok,I solved this problem.