Closed cpsosa closed 9 months ago
You can use the script download_references.sh
to download the reference files and build the STAR index. I recommend GRCh38+GENCODE38
. If you are also interested in detecting viral integration sites, you can use GRCh38viral+GENCODE38
. Run the script without any options to see all available reference files.
Does this answer your question?
I should note that STAR >=2.7.10a is recommended for use with Arriba. Version 2.7.8a works, but you will not be able to detect fusions from multi-mapping reads. There is a small number of known cancer driver fusions that will be missed.
In case you have not found it yet, here is the quickstart guide, which may contain some more useful bits of information: https://arriba.readthedocs.io/en/latest/quickstart/
Thank you. Very useful information that we'll read carefully. We use hg19 but there is also info about hg19, I believe from what I briefly read.
Thanks,
On Fri, Aug 11, 2023 at 3:45 AM suhrig @.***> wrote:
You can use the script download_references.sh to download the reference files and build the STAR index. I recommend GRCh38+GENCODE38. If you are also interested in detecting viral integration sites, you can use GRCh38viral+GENCODE38. Run the script without any options to see all available reference files.
Does this answer your question?
I should note that STAR >=2.7.10a is recommended for use with Arriba. Version 2.7.8a works, but you will not be able to detect fusions from multi-mapping reads. There is a small number of known cancer driver fusions that will be missed.
In case you have not found it yet, here is the quickstart guide, which may contain some more useful bits of information: https://arriba.readthedocs.io/en/latest/quickstart/
— Reply to this email directly, view it on GitHub https://github.com/suhrig/arriba/issues/209#issuecomment-1674403723, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAEPKRCLU357ZFK6KLGENDXUXWLHANCNFSM6AAAAAA3MJWAKU . You are receiving this because you authored the thread.Message ID: @.***>
Hi Suhrig,
We are planning to try singularity for the first time. Two question from the installers:
How can we download the container image to local?
Does the container do the star alignment itself? (this seems true for me but need to confirm).
Thanks,
On Fri, Aug 11, 2023 at 7:50 AM Carlos P. Sosa @.***> wrote:
Thank you. Very useful information that we'll read carefully. We use hg19 but there is also info about hg19, I believe from what I briefly read.
Thanks,
On Fri, Aug 11, 2023 at 3:45 AM suhrig @.***> wrote:
You can use the script download_references.sh to download the reference files and build the STAR index. I recommend GRCh38+GENCODE38. If you are also interested in detecting viral integration sites, you can use GRCh38viral+GENCODE38. Run the script without any options to see all available reference files.
Does this answer your question?
I should note that STAR >=2.7.10a is recommended for use with Arriba. Version 2.7.8a works, but you will not be able to detect fusions from multi-mapping reads. There is a small number of known cancer driver fusions that will be missed.
In case you have not found it yet, here is the quickstart guide, which may contain some more useful bits of information: https://arriba.readthedocs.io/en/latest/quickstart/
— Reply to this email directly, view it on GitHub https://github.com/suhrig/arriba/issues/209#issuecomment-1674403723, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAEPKRCLU357ZFK6KLGENDXUXWLHANCNFSM6AAAAAA3MJWAKU . You are receiving this because you authored the thread.Message ID: @.***>
Sorry, I forgot to add. We only use hg19 so we need hg19.
How can we download the container image to local?
You can convert the docker image to a local singularity image using the following command:
sudo singularity build arriba-2.4.0.sif docker://uhrigs/arriba:2.4.0
Does the container do the star alignment itself?
Yes, it does. It uses optimized alignment parameters for fusion detection, which improves detection a bit. Let me know if you prefer to do the alignment yourself. There is a way to bypass the built-in alignment.
We only use hg19 so we need hg19.
Take a look at the singularity-based installation and use hg19+GENCODE19
as an argument to download_references.sh
.
Thank you, I'll forward this to the installers.
On Fri, Aug 11, 2023 at 2:27 PM suhrig @.***> wrote:
How can we download the container image to local?
You can convert the docker image to a local singularity image using the following command:
sudo singularity build arriba-2.4.0.sif docker://uhrigs/arriba:2.4.0
Does the container do the star alignment itself?
Yes, it does. It uses optimized alignment parameters for fusion detection, which improves detection a bit. Let me know if you prefer to do the alignment yourself. There is a way to bypass the built-in alignment.
We only use hg19 so we need hg19.
Take a look at the singularity-based installation https://arriba.readthedocs.io/en/latest/quickstart/#installation-using-singularity and use hg19+GENCODE19 as an argument.
— Reply to this email directly, view it on GitHub https://github.com/suhrig/arriba/issues/209#issuecomment-1675268853, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAEPKXBMLAAM5YYF5S7HGTXU2BQVANCNFSM6AAAAAA3MJWAKU . You are receiving this because you authored the thread.Message ID: @.***>
Hi Suhrig,
From the developers:
[10:41 AM] Huang, Shengbing, M.S.
Sosa, Carlos, Ph.D. there is problem downloading ref files using local container or the online one:
singularity exec -B /dlmp/dev/scripts/sources/teamwork/shengbing/temp/MAPRSEQ/arriba_singularity:/references /biotools8/biotools/arriba/2.4.0/arriba-2.4.0.sif download_references.sh hg19+GENCODE19
INFO: Converting SIF file to temporary sandbox...
Downloading assembly: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/bigZips/chromFa.tar.gz
/arriba_v2.4.0/download_references.sh: line 77: Usage:: command not found
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
INFO: Cleaning up image...
++++++++++++++++++++++++++++++++++++++++++++
singularity exec -B /dlmp/dev/scripts/sources/teamwork/shengbing/temp/MAPRSEQ/arriba_singularity_references:/references docker://uhrigs/arriba:2.4.0 download_references.sh hg19+GENCODE19
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
Getting image source signatures
Copying blob b549f31133a9 done
Copying blob 02b22cd24c11 done
Copying blob 258204f57c76 done
Copying blob b37b459303b2 done
Copying blob 3c949f191469 done
Copying blob 32f3485c9ca8 done
Copying blob f9e23ccb88c1 done
Copying config 54fa66b42d done
Writing manifest to image destination
Storing signatures
2023/08/23 10:38:30 info unpack layer: sha256:b549f31133a955f68f9fa0d93f18436c4a180e12184b999a8ecf14f7eaa83309
2023/08/23 10:38:30 info unpack layer: sha256:02b22cd24c113535f75002d320960beef072dc965aeca3e0d1f34036964cb760
2023/08/23 10:38:36 info unpack layer: sha256:258204f57c76904353e1ff58a90b67700c22c79532be1fcfb59c6879c4d528e5
2023/08/23 10:38:36 info unpack layer: sha256:b37b459303b2cb0e127fe979716133718eb916884a4841930694f4d5ed7e20f0
2023/08/23 10:38:42 info unpack layer: sha256:3c949f1914698e82a3fb261a6aaf56c14715b262d57740d2a1e9a9be65318009
2023/08/23 10:38:42 info unpack layer: sha256:32f3485c9ca8e5a3cd72b5fdef66fe528e511d9cef3e4cbacbce9f91af90c051
2023/08/23 10:38:42 info unpack layer: sha256:f9e23ccb88c16acb1475165750304b620a9d0232ced548b5ebbe3369803a2c85
INFO: Creating SIF file...
INFO: Converting SIF file to temporary sandbox...
Downloading assembly: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/bigZips/chromFa.tar.gz
/arriba_v2.4.0/download_references.sh: line 77: Usage:: command not found
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
INFO: Cleaning up image...
[10:41 AM] Huang, Shengbing, M.S.
It is the same problem. You may need to contact the developer. Another message:
downloading ref files using our standard installation of arriba seems to lack certain files such as blacklist_*.tsv.gz:
Cd /dlmp/dev/scripts/sources/teamwork/shengbing/temp/MAPRSEQ/arriba_singularity
/biotools8/biotools/arriba/2.4.0/download_references.sh hg19+GENCODE19
Downloading assembly: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/bigZips/chromFa.tar.gz
wget: /home/huans/.netrc:1: unknown token "or"
wget: /home/huans/.netrc:1: unknown token "xxx.xxx.xxx.xxx>"
wget: /home/huans/.netrc:2: unknown token "id>"
wget: /home/huans/.netrc:3: unknown token "password>"
Indexing assembly
Downloading annotation: http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz
wget: /home/huans/.netrc:1: unknown token "or"
wget: /home/huans/.netrc:1: unknown token "xxx.xxx.xxx.xxx>"
wget: /home/huans/.netrc:2: unknown token "id>"
wget: /home/huans/.netrc:3: unknown token "password>"
Aug 23 09:11:22 ..... started STAR run
Aug 23 09:11:22 ... starting to generate Genome files
Aug 23 09:12:07 ..... processing annotations GTF
Aug 23 09:12:30 ... starting to sort Suffix Array. This may take a long time...
Aug 23 09:12:43 ... sorting Suffix Array chunks and saving them to disk...
Aug 23 09:32:48 ... loading chunks from disk, packing SA...
Aug 23 09:35:02 ... finished generating suffix array
Aug 23 09:35:02 ... generating Suffix Array index
Aug 23 09:38:41 ... completed Suffix Array index
Aug 23 09:38:41 ..... inserting junctions into the genome indices
Aug 23 09:45:39 ... writing Genome to disk ...
Aug 23 09:45:45 ... writing Suffix Array to disk ...
Aug 23 09:46:27 ... writing SAindex to disk
Aug 23 09:46:31 ..... finished successfully
singularity exec -B /dlmp/dev/scripts/sources/teamwork/shengbing/temp/MAPRSEQ/arriba_singularity_tests/output1:/output -B /dlmp/misc-data/pipelinedata/deployments/maprseq/references/snapshot_021821/metafusion_reference_files/Arriba_Singularity:/references:ro -B /dlmp/dev/temp/cromwell/MAPRSEQSingleSampleMasterWF/149d6aa3-e538-4814-8d5c-3d6942d37688/call-trimseq/FASTP.RunFastpTask/4b215d7f-52d0-4b90-92ec-83e9326d0918/call-FASTP_paired/shard-0/execution/glob-1ec9b4c61275ac86cb9c089e4c139c17/out.R1.fq.gz:/out.R1.fq.gz:ro -B /dlmp/dev/temp/cromwell/MAPRSEQSingleSampleMasterWF/149d6aa3-e538-4814-8d5c-3d6942d37688/call-trimseq/FASTP.RunFastpTask/4b215d7f-52d0-4b90-92ec-83e9326d0918/call-FASTP_paired/shard-0/execution/glob-1ec9b4c61275ac86cb9c089e4c139c17/out.R2.fq.gz:/out.R2.fq.gz:ro /biotools8/biotools/arriba/2.4.0/arriba-2.4.0.sif arriba.sh
INFO: Converting SIF file to temporary sandbox...
STAR_INDEX_DIR=/references/STAR_index_hg19_GENCODE19
ANNOTATION_GTF=/references/GENCODE19.gtf
ASSEMBLY_FA=/references/hg19.fa
BLACKLISTTSV='/references/blacklist*.tsv.gz'
KNOWN_FUSIONS_TSV='/references/knownfusions*.tsv.gz'
TAGS_TSV='/references/knownfusions*.tsv.gz'
PROTEIN_DOMAINS_GFF3='/references/proteindomains*.gff3'
THREADS=8
READ1=/read1.fastq.gz
READ2=
++ dirname /arriba_v2.4.0/run_arriba.sh
BASE_DIR=/arriba_v2.4.0
STAR --runThreadN 8 --genomeDir /references/STAR_index_hg19_GENCODE19 --genomeLoad NoSharedMemory --readFilesIn /read1.fastq.gz '' --readFilesCommand zcat --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outSAMunmapped Within --outBAMcompression 0 --outFilterMultimapNmax 50 --peOverlapNbasesMin 10 --alignSplicedMateMapLminOverLmate 0.5 --alignSJstitchMismatchNmax 5 -1 5 5 --chimSegmentMin 10 --chimOutType WithinBAM HardClip --chimJunctionOverhangMin 10 --chimScoreDropMax 30 --chimScoreJunctionNonGTAG 0 --chimScoreSeparation 1 --chimSegmentReadGapMax 3 --chimMultimapNmax 50
tee Aligned.out.bam
/arribav2.4.0/arriba -x /dev/stdin -o fusions.tsv -O fusions.discarded.tsv -a /references/hg19.fa -g /references/GENCODE19.gtf -b '/references/blacklist.tsv.gz' -k '/references/knownfusions.tsv.gz' -t '/references/knownfusions.tsv.gz' -p '/references/proteindomains.gff3'
[2023-08-23T10:23:09] Launching Arriba 2.4.0
ERROR: file not found/readable: /references/blacklist_*.tsv.gz
!!!!! WARNING: Could not ls /read1.fastq.gz
EXITING: because of fatal INPUT file error: could not open read file: /read1.fastq.gz
SOLUTION: check that this file exists and has read permision.
Aug 23 10:23:09 ...... FATAL ERROR, exiting
INFO: Cleaning up image...
On Sun, Aug 13, 2023 at 4:43 PM Carlos P. Sosa @.***> wrote:
Thank you, I'll forward this to the installers.
On Fri, Aug 11, 2023 at 2:27 PM suhrig @.***> wrote:
How can we download the container image to local?
You can convert the docker image to a local singularity image using the following command:
sudo singularity build arriba-2.4.0.sif docker://uhrigs/arriba:2.4.0
Does the container do the star alignment itself?
Yes, it does. It uses optimized alignment parameters for fusion detection, which improves detection a bit. Let me know if you prefer to do the alignment yourself. There is a way to bypass the built-in alignment.
We only use hg19 so we need hg19.
Take a look at the singularity-based installation https://arriba.readthedocs.io/en/latest/quickstart/#installation-using-singularity and use hg19+GENCODE19 as an argument.
— Reply to this email directly, view it on GitHub https://github.com/suhrig/arriba/issues/209#issuecomment-1675268853, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAEPKXBMLAAM5YYF5S7HGTXU2BQVANCNFSM6AAAAAA3MJWAKU . You are receiving this because you authored the thread.Message ID: @.***>
Which version of singularity do you use? This should work ...
Are you still looking for a solution? This should work. Maybe you're using an incompatible version of Singularity. I could do some tests if you tell me your version.
Hi Suhrig,
It works, thanks for checking.
On Thu, Oct 5, 2023 at 3:55 PM suhrig @.***> wrote:
Are you still looking for a solution? This should work. Maybe you're using an incompatible version of Singularity. I could do some tests if you tell me your version.
— Reply to this email directly, view it on GitHub https://github.com/suhrig/arriba/issues/209#issuecomment-1749628504, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAEPKXOTCCSIRHT6X52FPTX54ND3AVCNFSM6AAAAAA3MJWAKWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBZGYZDQNJQGQ . You are receiving this because you authored the thread.Message ID: @.***>
Hi Suhrig,
We would like to use arriba v2.4.0 with STAR 2.7.8a. What versions of the STAR reference index do you recommend? Do you have the command/options suggested to build the index?
Thanks,