Arriba with a specific genecode annotation version

ConcettaDe4 commented 3 years ago

Hi! I want to run Arriba with the genecode version 36. I was wondering how I can run the analysis: should I run before star and then arriba or can I run arriba using the fastq files?

Many thanks for the help.

Concetta

suhrig commented 3 years ago

Hi Concetta,

You cannot run Arriba directly on FastQ files. You must first run STAR to get alignments and then Arriba on the alignments. You can use the script run_arriba.sh. The script runs STAR first and then Arriba. You give it as an argument your favorite GENCODE version. Run the script without arguments to see how it is used. You can look inside to see how STAR and Arriba are called if you are interested about the details. It's a simple script.

If you already have alignments, you can run Arriba directly on the alignments. This only takes a few minutes, and you can choose the GENCODE version of your liking by giving it as an argument to Arriba. Have a look inside the run_arriba.sh script to see how Arriba is called.

Does this answer your question?

Regards, Sebastian

suhrig commented 3 years ago

If you already have alignments, you can run Arriba directly on the alignments.

I should clarify: Those alignments must have been generated with chimeric detection enabled, most notably the STAR parameter --chimSegmentMin must have been used. Again, see the run_arriba.sh script to see the recommended alignment parameters.

ConcettaDe4 commented 3 years ago

Hi! thank you for you reply. When I wrote "I run arriba using the fastq files" I mean run the script run_arriba.sh. I checked the run_arriba.sh scripts and the script download_references.sh, so I modified the script download_references.sh adding my genome and annotation of interest and I re-run the script download_references.sh. I have one more question: do you suggest to use the genome assembly including only the reference chromosomes or do you think that I can use the genome with also the scaffold sequences? How the unplaced scaffolds will affect the analysis?

suhrig commented 3 years ago

I modified the script download_references.sh` adding my genome and annotation of interest

Good idea.

do you suggest to use the genome assembly including only the reference chromosomes or do you think that I can use the genome with also the scaffold sequences?

I follow the recommendation stated in section 2.2.1 of the STAR manual, that is, include the unplaced scaffolds, but not patches or haplotypes. But if you want to reuse the alignments for other purposes than just fusion detection, feel free to choose an assembly with whatever scaffolds are of value to your use case. Honestly, I do not think it matters too much. I have never made a comparison, but I would think that only regions covered by the scaffolds would have different sensitivity for fusion detection. And even then, there is a good chance that the fusion would be detected in such a region, because Arriba can deal with multi-mapping reads. So if a fusion-supporting read mapped to a scaffold and to a reference chromosome alike, it would likely still be detected on the reference chromosome.

If you are particularly interested in rearrangements affecting certain scaffolds, make sure to add them to the list of interesting contigs using the parameter -i.

suhrig / arriba

Arriba with a specific genecode annotation version #123