suhrig / arriba

Fast and accurate gene fusion detection from RNA-Seq data
Other
214 stars 50 forks source link

Recommended STAR index version #209

Closed cpsosa closed 9 months ago

cpsosa commented 10 months ago

Hi Suhrig,

We would like to use arriba v2.4.0 with STAR 2.7.8a. What versions of the STAR reference index do you recommend? Do you have the command/options suggested to build the index?

Thanks,

suhrig commented 10 months ago

You can use the script download_references.sh to download the reference files and build the STAR index. I recommend GRCh38+GENCODE38. If you are also interested in detecting viral integration sites, you can use GRCh38viral+GENCODE38. Run the script without any options to see all available reference files.

Does this answer your question?

I should note that STAR >=2.7.10a is recommended for use with Arriba. Version 2.7.8a works, but you will not be able to detect fusions from multi-mapping reads. There is a small number of known cancer driver fusions that will be missed.

In case you have not found it yet, here is the quickstart guide, which may contain some more useful bits of information: https://arriba.readthedocs.io/en/latest/quickstart/

cpsosa commented 10 months ago

Thank you. Very useful information that we'll read carefully. We use hg19 but there is also info about hg19, I believe from what I briefly read.

Thanks,

On Fri, Aug 11, 2023 at 3:45 AM suhrig @.***> wrote:

You can use the script download_references.sh to download the reference files and build the STAR index. I recommend GRCh38+GENCODE38. If you are also interested in detecting viral integration sites, you can use GRCh38viral+GENCODE38. Run the script without any options to see all available reference files.

Does this answer your question?

I should note that STAR >=2.7.10a is recommended for use with Arriba. Version 2.7.8a works, but you will not be able to detect fusions from multi-mapping reads. There is a small number of known cancer driver fusions that will be missed.

In case you have not found it yet, here is the quickstart guide, which may contain some more useful bits of information: https://arriba.readthedocs.io/en/latest/quickstart/

— Reply to this email directly, view it on GitHub https://github.com/suhrig/arriba/issues/209#issuecomment-1674403723, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAEPKRCLU357ZFK6KLGENDXUXWLHANCNFSM6AAAAAA3MJWAKU . You are receiving this because you authored the thread.Message ID: @.***>

cpsosa commented 10 months ago

Hi Suhrig,

We are planning to try singularity for the first time. Two question from the installers:

How can we download the container image to local?

Does the container do the star alignment itself? (this seems true for me but need to confirm).

Thanks,

On Fri, Aug 11, 2023 at 7:50 AM Carlos P. Sosa @.***> wrote:

Thank you. Very useful information that we'll read carefully. We use hg19 but there is also info about hg19, I believe from what I briefly read.

Thanks,

On Fri, Aug 11, 2023 at 3:45 AM suhrig @.***> wrote:

You can use the script download_references.sh to download the reference files and build the STAR index. I recommend GRCh38+GENCODE38. If you are also interested in detecting viral integration sites, you can use GRCh38viral+GENCODE38. Run the script without any options to see all available reference files.

Does this answer your question?

I should note that STAR >=2.7.10a is recommended for use with Arriba. Version 2.7.8a works, but you will not be able to detect fusions from multi-mapping reads. There is a small number of known cancer driver fusions that will be missed.

In case you have not found it yet, here is the quickstart guide, which may contain some more useful bits of information: https://arriba.readthedocs.io/en/latest/quickstart/

— Reply to this email directly, view it on GitHub https://github.com/suhrig/arriba/issues/209#issuecomment-1674403723, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAEPKRCLU357ZFK6KLGENDXUXWLHANCNFSM6AAAAAA3MJWAKU . You are receiving this because you authored the thread.Message ID: @.***>

cpsosa commented 10 months ago

Sorry, I forgot to add. We only use hg19 so we need hg19.

suhrig commented 10 months ago

How can we download the container image to local?

You can convert the docker image to a local singularity image using the following command:

sudo singularity build arriba-2.4.0.sif docker://uhrigs/arriba:2.4.0

Does the container do the star alignment itself?

Yes, it does. It uses optimized alignment parameters for fusion detection, which improves detection a bit. Let me know if you prefer to do the alignment yourself. There is a way to bypass the built-in alignment.

We only use hg19 so we need hg19.

Take a look at the singularity-based installation and use hg19+GENCODE19 as an argument to download_references.sh.

cpsosa commented 10 months ago

Thank you, I'll forward this to the installers.

On Fri, Aug 11, 2023 at 2:27 PM suhrig @.***> wrote:

How can we download the container image to local?

You can convert the docker image to a local singularity image using the following command:

sudo singularity build arriba-2.4.0.sif docker://uhrigs/arriba:2.4.0

Does the container do the star alignment itself?

Yes, it does. It uses optimized alignment parameters for fusion detection, which improves detection a bit. Let me know if you prefer to do the alignment yourself. There is a way to bypass the built-in alignment.

We only use hg19 so we need hg19.

Take a look at the singularity-based installation https://arriba.readthedocs.io/en/latest/quickstart/#installation-using-singularity and use hg19+GENCODE19 as an argument.

— Reply to this email directly, view it on GitHub https://github.com/suhrig/arriba/issues/209#issuecomment-1675268853, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAEPKXBMLAAM5YYF5S7HGTXU2BQVANCNFSM6AAAAAA3MJWAKU . You are receiving this because you authored the thread.Message ID: @.***>

cpsosa commented 10 months ago

Hi Suhrig,

From the developers:

[10:41 AM] Huang, Shengbing, M.S.

Sosa, Carlos, Ph.D. there is problem downloading ref files using local container or the online one:

singularity exec -B /dlmp/dev/scripts/sources/teamwork/shengbing/temp/MAPRSEQ/arriba_singularity:/references /biotools8/biotools/arriba/2.4.0/arriba-2.4.0.sif download_references.sh hg19+GENCODE19

INFO: Converting SIF file to temporary sandbox...

Downloading assembly: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/bigZips/chromFa.tar.gz

/arriba_v2.4.0/download_references.sh: line 77: Usage:: command not found

gzip: stdin: unexpected end of file

tar: Child returned status 1

tar: Error is not recoverable: exiting now

INFO: Cleaning up image...

++++++++++++++++++++++++++++++++++++++++++++

singularity exec -B /dlmp/dev/scripts/sources/teamwork/shengbing/temp/MAPRSEQ/arriba_singularity_references:/references docker://uhrigs/arriba:2.4.0 download_references.sh hg19+GENCODE19

INFO: Converting OCI blobs to SIF format

INFO: Starting build...

Getting image source signatures

Copying blob b549f31133a9 done

Copying blob 02b22cd24c11 done

Copying blob 258204f57c76 done

Copying blob b37b459303b2 done

Copying blob 3c949f191469 done

Copying blob 32f3485c9ca8 done

Copying blob f9e23ccb88c1 done

Copying config 54fa66b42d done

Writing manifest to image destination

Storing signatures

2023/08/23 10:38:30 info unpack layer: sha256:b549f31133a955f68f9fa0d93f18436c4a180e12184b999a8ecf14f7eaa83309

2023/08/23 10:38:30 info unpack layer: sha256:02b22cd24c113535f75002d320960beef072dc965aeca3e0d1f34036964cb760

2023/08/23 10:38:36 info unpack layer: sha256:258204f57c76904353e1ff58a90b67700c22c79532be1fcfb59c6879c4d528e5

2023/08/23 10:38:36 info unpack layer: sha256:b37b459303b2cb0e127fe979716133718eb916884a4841930694f4d5ed7e20f0

2023/08/23 10:38:42 info unpack layer: sha256:3c949f1914698e82a3fb261a6aaf56c14715b262d57740d2a1e9a9be65318009

2023/08/23 10:38:42 info unpack layer: sha256:32f3485c9ca8e5a3cd72b5fdef66fe528e511d9cef3e4cbacbce9f91af90c051

2023/08/23 10:38:42 info unpack layer: sha256:f9e23ccb88c16acb1475165750304b620a9d0232ced548b5ebbe3369803a2c85

INFO: Creating SIF file...

INFO: Converting SIF file to temporary sandbox...

Downloading assembly: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/bigZips/chromFa.tar.gz

/arriba_v2.4.0/download_references.sh: line 77: Usage:: command not found

gzip: stdin: unexpected end of file

tar: Child returned status 1

tar: Error is not recoverable: exiting now

INFO: Cleaning up image...

[10:41 AM] Huang, Shengbing, M.S.

It is the same problem. You may need to contact the developer. Another message:

downloading ref files using our standard installation of arriba seems to lack certain files such as blacklist_*.tsv.gz:

Cd /dlmp/dev/scripts/sources/teamwork/shengbing/temp/MAPRSEQ/arriba_singularity

/biotools8/biotools/arriba/2.4.0/download_references.sh hg19+GENCODE19

Downloading assembly: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/bigZips/chromFa.tar.gz

wget: /home/huans/.netrc:1: unknown token "or"

wget: /home/huans/.netrc:1: unknown token "xxx.xxx.xxx.xxx>"

wget: /home/huans/.netrc:2: unknown token "id>"

wget: /home/huans/.netrc:3: unknown token "password>"

Indexing assembly

Downloading annotation: http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz

wget: /home/huans/.netrc:1: unknown token "or"

wget: /home/huans/.netrc:1: unknown token "xxx.xxx.xxx.xxx>"

wget: /home/huans/.netrc:2: unknown token "id>"

wget: /home/huans/.netrc:3: unknown token "password>"

Aug 23 09:11:22 ..... started STAR run

Aug 23 09:11:22 ... starting to generate Genome files

Aug 23 09:12:07 ..... processing annotations GTF

Aug 23 09:12:30 ... starting to sort Suffix Array. This may take a long time...

Aug 23 09:12:43 ... sorting Suffix Array chunks and saving them to disk...

Aug 23 09:32:48 ... loading chunks from disk, packing SA...

Aug 23 09:35:02 ... finished generating suffix array

Aug 23 09:35:02 ... generating Suffix Array index

Aug 23 09:38:41 ... completed Suffix Array index

Aug 23 09:38:41 ..... inserting junctions into the genome indices

Aug 23 09:45:39 ... writing Genome to disk ...

Aug 23 09:45:45 ... writing Suffix Array to disk ...

Aug 23 09:46:27 ... writing SAindex to disk

Aug 23 09:46:31 ..... finished successfully

singularity exec -B /dlmp/dev/scripts/sources/teamwork/shengbing/temp/MAPRSEQ/arriba_singularity_tests/output1:/output -B /dlmp/misc-data/pipelinedata/deployments/maprseq/references/snapshot_021821/metafusion_reference_files/Arriba_Singularity:/references:ro -B /dlmp/dev/temp/cromwell/MAPRSEQSingleSampleMasterWF/149d6aa3-e538-4814-8d5c-3d6942d37688/call-trimseq/FASTP.RunFastpTask/4b215d7f-52d0-4b90-92ec-83e9326d0918/call-FASTP_paired/shard-0/execution/glob-1ec9b4c61275ac86cb9c089e4c139c17/out.R1.fq.gz:/out.R1.fq.gz:ro -B /dlmp/dev/temp/cromwell/MAPRSEQSingleSampleMasterWF/149d6aa3-e538-4814-8d5c-3d6942d37688/call-trimseq/FASTP.RunFastpTask/4b215d7f-52d0-4b90-92ec-83e9326d0918/call-FASTP_paired/shard-0/execution/glob-1ec9b4c61275ac86cb9c089e4c139c17/out.R2.fq.gz:/out.R2.fq.gz:ro /biotools8/biotools/arriba/2.4.0/arriba-2.4.0.sif arriba.sh

INFO: Converting SIF file to temporary sandbox...

++ dirname /arriba_v2.4.0/run_arriba.sh

[2023-08-23T10:23:09] Launching Arriba 2.4.0

ERROR: file not found/readable: /references/blacklist_*.tsv.gz

!!!!! WARNING: Could not ls /read1.fastq.gz

EXITING: because of fatal INPUT file error: could not open read file: /read1.fastq.gz

SOLUTION: check that this file exists and has read permision.

Aug 23 10:23:09 ...... FATAL ERROR, exiting

INFO: Cleaning up image...

On Sun, Aug 13, 2023 at 4:43 PM Carlos P. Sosa @.***> wrote:

Thank you, I'll forward this to the installers.

On Fri, Aug 11, 2023 at 2:27 PM suhrig @.***> wrote:

How can we download the container image to local?

You can convert the docker image to a local singularity image using the following command:

sudo singularity build arriba-2.4.0.sif docker://uhrigs/arriba:2.4.0

Does the container do the star alignment itself?

Yes, it does. It uses optimized alignment parameters for fusion detection, which improves detection a bit. Let me know if you prefer to do the alignment yourself. There is a way to bypass the built-in alignment.

We only use hg19 so we need hg19.

Take a look at the singularity-based installation https://arriba.readthedocs.io/en/latest/quickstart/#installation-using-singularity and use hg19+GENCODE19 as an argument.

— Reply to this email directly, view it on GitHub https://github.com/suhrig/arriba/issues/209#issuecomment-1675268853, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAEPKXBMLAAM5YYF5S7HGTXU2BQVANCNFSM6AAAAAA3MJWAKU . You are receiving this because you authored the thread.Message ID: @.***>

suhrig commented 10 months ago

Which version of singularity do you use? This should work ...

suhrig commented 9 months ago

Are you still looking for a solution? This should work. Maybe you're using an incompatible version of Singularity. I could do some tests if you tell me your version.

cpsosa commented 9 months ago

Hi Suhrig,

It works, thanks for checking.

On Thu, Oct 5, 2023 at 3:55 PM suhrig @.***> wrote:

Are you still looking for a solution? This should work. Maybe you're using an incompatible version of Singularity. I could do some tests if you tell me your version.

— Reply to this email directly, view it on GitHub https://github.com/suhrig/arriba/issues/209#issuecomment-1749628504, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAEPKXOTCCSIRHT6X52FPTX54ND3AVCNFSM6AAAAAA3MJWAKWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBZGYZDQNJQGQ . You are receiving this because you authored the thread.Message ID: @.***>