nf-core / sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
https://nf-co.re/sarek
MIT License
400 stars 404 forks source link

Question regarding sarek dockers : #1233

Open ChitrArpita opened 1 year ago

ChitrArpita commented 1 year ago

Is there any way we can download all the docker images used by sarek locally, and some how change the paths for reading the docker files. Basically I want to run sarek from some server where I do not have access to internet.

FriederikeHanssen commented 1 year ago

Do you also have access to singularity/apptainer on you offline cluster? in that case, we have a subcommand in tools nf-core download that takes care of everything. It doesn't work with docker though.

ewels commented 1 year ago

You might be able to do something with the brand new nextflow inspect command that was released in Nextflow 23.09.1-edge just a few days ago:

nextflow inspect nf-core/sarek -profile test,docker --outdir tmp

This gives you all of the docker container images:

Output ```json { "processes": [ { "name": "NFCORE_SAREK:SAREK:CUSTOM_DUMPSOFTWAREVERSIONS", "container": "quay.io/biocontainers/multiqc:1.15--pyhdfd78af_0" }, { "name": "NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_SINGLE_STRELKA:STRELKA_SINGLE", "container": "quay.io/biocontainers/strelka:2.9.10--h9ee0642_1" }, { "name": "NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:VCFTOOLS_SUMMARY", "container": "quay.io/biocontainers/vcftools:0.1.16--he513fc3_4" }, { "name": "NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:VCFTOOLS_TSTV_QUAL", "container": "quay.io/biocontainers/vcftools:0.1.16--he513fc3_4" }, { "name": "NFCORE_SAREK:SAREK:FASTQ_ALIGN_BWAMEM_MEM2_DRAGMAP_SENTIEON:BWAMEM2_MEM", "container": "quay.io/biocontainers/mulled-v2-e5d375990341c5aef3c9aff74f96f66f65375ef6:2cdf6bf1e92acbeb9b2834b1c58754167173a410-0" }, { "name": "NFCORE_SAREK:SAREK:PREPARE_INTERVALS:GATK4_INTERVALLISTTOBED", "container": "quay.io/biocontainers/gatk4:4.4.0.0--py36hdfd78af_0" }, { "name": "NFCORE_SAREK:SAREK:BAM_BASERECALIBRATOR:GATK4_GATHERBQSRREPORTS", "container": "quay.io/biocontainers/gatk4:4.4.0.0--py36hdfd78af_0" }, { "name": "NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:CRAM_QC_MOSDEPTH_SAMTOOLS:SAMTOOLS_STATS", "container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0" }, { "name": "NFCORE_SAREK:SAREK:CRAM_TO_BAM", "container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0" }, { "name": "NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_TUMOR_ONLY_ALL:BAM_VARIANT_CALLING_SINGLE_STRELKA:MERGE_STRELKA_GENOME", "container": "quay.io/biocontainers/gatk4:4.4.0.0--py36hdfd78af_0" }, { "name": "NFCORE_SAREK:SAREK:PREPARE_GENOME:TABIX_GERMLINE_RESOURCE", "container": "quay.io/biocontainers/tabix:1.11--hdfd78af_0" }, { "name": "NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:COLLATE_FASTQ_UNMAP", "container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0" }, { "name": "NFCORE_SAREK:SAREK:CRAM_QC_RECAL:MOSDEPTH", "container": "quay.io/biocontainers/mosdepth:0.3.3--hdfd78af_1" }, { "name": "NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:VCFTOOLS_TSTV_COUNT", "container": "quay.io/biocontainers/vcftools:0.1.16--he513fc3_4" }, { "name": "NFCORE_SAREK:SAREK:BAM_APPLYBQSR:CRAM_MERGE_INDEX_SAMTOOLS:INDEX_CRAM", "container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0" }, { "name": "NFCORE_SAREK:SAREK:PREPARE_GENOME:BWAMEM2_INDEX", "container": "quay.io/biocontainers/bwa-mem2:2.2.1--he513fc3_0" }, { "name": "NFCORE_SAREK:SAREK:PREPARE_GENOME:TABIX_KNOWN_SNPS", "container": "quay.io/biocontainers/tabix:1.11--hdfd78af_0" }, { "name": "NFCORE_SAREK:SAREK:BAM_BASERECALIBRATOR:GATK4_BASERECALIBRATOR", "container": "quay.io/biocontainers/gatk4:4.4.0.0--py36hdfd78af_0" }, { "name": "NFCORE_SAREK:SAREK:PREPARE_GENOME:SAMTOOLS_FAIDX", "container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0" }, { "name": "NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_TUMOR_ONLY_ALL:BAM_VARIANT_CALLING_SINGLE_STRELKA:STRELKA_SINGLE", "container": "quay.io/biocontainers/strelka:2.9.10--h9ee0642_1" }, { "name": "NFCORE_SAREK:SAREK:PREPARE_GENOME:MSISENSORPRO_SCAN", "container": "quay.io/biocontainers/msisensor-pro:1.2.0--hfc31af2_0" }, { "name": "NFCORE_SAREK:SAREK:PREPARE_GENOME:TABIX_DBSNP", "container": "quay.io/biocontainers/tabix:1.11--hdfd78af_0" }, { "name": "NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_STRELKA:MERGE_STRELKA_SNVS", "container": "quay.io/biocontainers/gatk4:4.4.0.0--py36hdfd78af_0" }, { "name": "NFCORE_SAREK:SAREK:MULTIQC", "container": "quay.io/biocontainers/multiqc:1.15--pyhdfd78af_0" }, { "name": "NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_STRELKA:STRELKA_SOMATIC", "container": "quay.io/biocontainers/strelka:2.9.10--h9ee0642_1" }, { "name": "NFCORE_SAREK:SAREK:PREPARE_INTERVALS:CREATE_INTERVALS_BED", "container": "quay.io/biocontainers/gawk:5.1.0" }, { "name": "NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_SINGLE_STRELKA:MERGE_STRELKA_GENOME", "container": "quay.io/biocontainers/gatk4:4.4.0.0--py36hdfd78af_0" }, { "name": "NFCORE_SAREK:SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_COMBINED", "container": "quay.io/biocontainers/tabix:1.11--hdfd78af_0" }, { "name": "NFCORE_SAREK:SAREK:PREPARE_GENOME:GATK4_CREATESEQUENCEDICTIONARY", "container": "quay.io/biocontainers/gatk4:4.4.0.0--py36hdfd78af_0" }, { "name": "NFCORE_SAREK:SAREK:FASTQ_ALIGN_BWAMEM_MEM2_DRAGMAP_SENTIEON:DRAGMAP_ALIGN", "container": "quay.io/biocontainers/mulled-v2-580d344d9d4a496cd403932da8765f9e0187774d:5ebebbc128cd624282eaa37d2c7fe01505a91a69-0" }, { "name": "NFCORE_SAREK:SAREK:PREPARE_GENOME:BWAMEM1_INDEX", "container": "quay.io/biocontainers/bwa:0.7.17--hed695b0_7" }, { "name": "NFCORE_SAREK:SAREK:CRAM_QC_RECAL:SAMTOOLS_STATS", "container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0" }, { "name": "NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:COLLATE_FASTQ_MAP", "container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0" }, { "name": "NFCORE_SAREK:SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_SPLIT", "container": "quay.io/biocontainers/tabix:1.11--hdfd78af_0" }, { "name": "NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:CRAM_QC_MOSDEPTH_SAMTOOLS:MOSDEPTH", "container": "quay.io/biocontainers/mosdepth:0.3.3--hdfd78af_1" }, { "name": "NFCORE_SAREK:SAREK:FASTQC", "container": "quay.io/biocontainers/fastqc:0.11.9--0" }, { "name": "NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:SAMTOOLS_VIEW_UNMAP_UNMAP", "container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0" }, { "name": "NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:SAMTOOLS_MERGE_UNMAP", "container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0" }, { "name": "NFCORE_SAREK:SAREK:PREPARE_GENOME:TABIX_PON", "container": "quay.io/biocontainers/tabix:1.11--hdfd78af_0" }, { "name": "NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:SAMTOOLS_VIEW_MAP_UNMAP", "container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0" }, { "name": "NFCORE_SAREK:SAREK:CRAM_TO_BAM_RECAL", "container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0" }, { "name": "NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:CAT_FASTQ", "container": "quay.io/nf-core/ubuntu:20.04" }, { "name": "NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES", "container": "quay.io/biocontainers/mulled-v2-d9e7bad0f7fbc8f4458d5c3ab7ffaaf0235b59fb:f857e2d6cc88d35580d01cf39e0959a68b83c1d9-0" }, { "name": "NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:BCFTOOLS_STATS", "container": "quay.io/biocontainers/bcftools:1.17--haef29d1_0" }, { "name": "NFCORE_SAREK:SAREK:PREPARE_GENOME:TABIX_KNOWN_INDELS", "container": "quay.io/biocontainers/tabix:1.11--hdfd78af_0" }, { "name": "NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:SAMTOOLS_VIEW_MAP_MAP", "container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0" }, { "name": "NFCORE_SAREK:SAREK:CONVERT_FASTQ_INPUT:SAMTOOLS_VIEW_UNMAP_MAP", "container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0" }, { "name": "NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_STRELKA:MERGE_STRELKA_INDELS", "container": "quay.io/biocontainers/gatk4:4.4.0.0--py36hdfd78af_0" }, { "name": "NFCORE_SAREK:SAREK:BAM_APPLYBQSR:GATK4_APPLYBQSR", "container": "quay.io/biocontainers/gatk4:4.4.0.0--py36hdfd78af_0" }, { "name": "NFCORE_SAREK:SAREK:FASTQ_ALIGN_BWAMEM_MEM2_DRAGMAP_SENTIEON:BWAMEM1_MEM", "container": "quay.io/biocontainers/mulled-v2-fe8faa35dbf6dc65a0f7f5d4ea12e31a79f73e40:219b6c272b25e7e642ae3ff0bf0c5c81a5135ab4-0" }, { "name": "NFCORE_SAREK:SAREK:FASTQ_ALIGN_BWAMEM_MEM2_DRAGMAP_SENTIEON:SENTIEON_BWAMEM", "container": "quay.io/nf-core/sentieon:202112.06" }, { "name": "NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_TUMOR_ONLY_ALL:BAM_VARIANT_CALLING_SINGLE_STRELKA:MERGE_STRELKA", "container": "quay.io/biocontainers/gatk4:4.4.0.0--py36hdfd78af_0" }, { "name": "NFCORE_SAREK:SAREK:PREPARE_GENOME:DRAGMAP_HASHTABLE", "container": "quay.io/biocontainers/dragmap:1.2.1--h72d16da_1" }, { "name": "NFCORE_SAREK:SAREK:BAM_APPLYBQSR:CRAM_MERGE_INDEX_SAMTOOLS:MERGE_CRAM", "container": "quay.io/biocontainers/samtools:1.17--h00cdaf9_0" }, { "name": "NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_SINGLE_STRELKA:MERGE_STRELKA", "container": "quay.io/biocontainers/gatk4:4.4.0.0--py36hdfd78af_0" } ] } ```

With a bit of massaging using jq, this can be deduplicated:

$ cat sarek_containers.json | jq '[.processes[].container]|unique'
[
  "quay.io/biocontainers/bcftools:1.17--haef29d1_0",
  "quay.io/biocontainers/bwa-mem2:2.2.1--he513fc3_0",
  "quay.io/biocontainers/bwa:0.7.17--hed695b0_7",
  "quay.io/biocontainers/dragmap:1.2.1--h72d16da_1",
  "quay.io/biocontainers/fastqc:0.11.9--0",
  "quay.io/biocontainers/gatk4:4.4.0.0--py36hdfd78af_0",
  "quay.io/biocontainers/gawk:5.1.0",
  "quay.io/biocontainers/mosdepth:0.3.3--hdfd78af_1",
  "quay.io/biocontainers/msisensor-pro:1.2.0--hfc31af2_0",
  "quay.io/biocontainers/mulled-v2-580d344d9d4a496cd403932da8765f9e0187774d:5ebebbc128cd624282eaa37d2c7fe01505a91a69-0",
  "quay.io/biocontainers/mulled-v2-d9e7bad0f7fbc8f4458d5c3ab7ffaaf0235b59fb:f857e2d6cc88d35580d01cf39e0959a68b83c1d9-0",
  "quay.io/biocontainers/mulled-v2-e5d375990341c5aef3c9aff74f96f66f65375ef6:2cdf6bf1e92acbeb9b2834b1c58754167173a410-0",
  "quay.io/biocontainers/mulled-v2-fe8faa35dbf6dc65a0f7f5d4ea12e31a79f73e40:219b6c272b25e7e642ae3ff0bf0c5c81a5135ab4-0",
  "quay.io/biocontainers/multiqc:1.15--pyhdfd78af_0",
  "quay.io/biocontainers/samtools:1.17--h00cdaf9_0",
  "quay.io/biocontainers/strelka:2.9.10--h9ee0642_1",
  "quay.io/biocontainers/tabix:1.11--hdfd78af_0",
  "quay.io/biocontainers/vcftools:0.1.16--he513fc3_4",
  "quay.io/nf-core/sentieon:202112.06",
  "quay.io/nf-core/ubuntu:20.04"
]

..and with a little more, turned into a space separated string..

$ cat sarek_containers.json | jq -r '[.processes[].container]|unique|join(" ")'
quay.io/biocontainers/bcftools:1.17--haef29d1_0 quay.io/biocontainers/bwa-mem2:2.2.1--he513fc3_0 quay.io/biocontainers/bwa:0.7.17--hed695b0_7 quay.io/biocontainers/dragmap:1.2.1--h72d16da_1 quay.io/biocontainers/fastqc:0.11.9--0 quay.io/biocontainers/gatk4:4.4.0.0--py36hdfd78af_0 quay.io/biocontainers/gawk:5.1.0 quay.io/biocontainers/mosdepth:0.3.3--hdfd78af_1 quay.io/biocontainers/msisensor-pro:1.2.0--hfc31af2_0 quay.io/biocontainers/mulled-v2-580d344d9d4a496cd403932da8765f9e0187774d:5ebebbc128cd624282eaa37d2c7fe01505a91a69-0 quay.io/biocontainers/mulled-v2-d9e7bad0f7fbc8f4458d5c3ab7ffaaf0235b59fb:f857e2d6cc88d35580d01cf39e0959a68b83c1d9-0 quay.io/biocontainers/mulled-v2-e5d375990341c5aef3c9aff74f96f66f65375ef6:2cdf6bf1e92acbeb9b2834b1c58754167173a410-0 quay.io/biocontainers/mulled-v2-fe8faa35dbf6dc65a0f7f5d4ea12e31a79f73e40:219b6c272b25e7e642ae3ff0bf0c5c81a5135ab4-0 quay.io/biocontainers/multiqc:1.15--pyhdfd78af_0 quay.io/biocontainers/samtools:1.17--h00cdaf9_0 quay.io/biocontainers/strelka:2.9.10--h9ee0642_1 quay.io/biocontainers/tabix:1.11--hdfd78af_0 quay.io/biocontainers/vcftools:0.1.16--he513fc3_4 quay.io/nf-core/sentieon:202112.06 quay.io/nf-core/ubuntu:20.04

..that you can now use as an input for the docker save command:

I got this far and hit errors from docker about reference does not exist (missing the docker:// prefix?) and then invalid reference format. So needs a bit more investigation after this point, but I need to jump into a meeting now 😅 😬

ewels commented 1 year ago

If we can figure this out, we should implement it into nf-core download functionality.

pontus commented 1 year ago

I think docker save only considers local references, so the flow would be a loop that did docker pull and then a final docker save (if one wants them in a somewhat unwieldly larger file) or individual docker save as one goes along (for separate files).

There's no rewriting needed on the destination system, Images retain their identity for the copy that exists locally.

ChitrArpita commented 1 year ago

Do you also have access to singularity/apptainer on you offline cluster? in that case, we have a subcommand in tools nf-core download that takes care of everything. It doesn't work with docker though.

no I have docker in the server. that is why I wanted to download all the images locally and pass some argument where It can be located to the folder where I have kept the docker images

ChitrArpita commented 1 year ago

the reference files I already downloaded locally and passing it through --igenomebase, was wandering if similar thing can be done for the docker images.

pontus commented 1 year ago

There's nothing in nextflow (or currently in nf-core tooling) helping with that, but it's a fairly simple scripting.

Phil has already extracted the needed image names for the latest release/dev. An alternative if you want to use an older release (and have no wish to update nextflow to edge just now) is to run sarek with the test profile. You can then do something like

docker image ls --format='{{ .Repository }}:{{ .Tag }}' | while read img; do docker save img -o "$(echo $img | tr -d ':/').tar"; done

move all those tar files to the destination, put them somewhere and do

for p in *; do docker load -i "$p"; done

after which you should be good to go.

mribeirodantas commented 1 year ago
nextflow inspect nf-core/sarek -profile test,docker --outdir tmp > sarek_containers.json
cat sarek_containers.json | jq -r '[.processes[].container]|unique|join(" ")' > containers.txt
for image in $(cat containers.txt); do $(which docker) pull $image; done
cat containers.txt | xargs -n1 echo | xargs -I{} zsh -c 'docker save {} >  $(echo {} | grep -oE "[^/]+$").tar'
ewels commented 1 year ago

Once more, with comments:

# Get list of containers from the sarek pipeline
nextflow inspect nf-core/sarek -profile test,docker --outdir tmp > sarek_containers.json

# Flatten JSON to a deduplicated list
cat sarek_containers.json | jq -r '[.processes[].container]|unique|join(" ")' > containers.txt

# Pull all images to local Docker
for image in $(cat containers.txt); do $(which docker) pull $image; done

# Save each image to a tar file, sanitising filename
cat containers.txt | xargs -n1 echo | xargs -I{} zsh -c 'docker save {} >  $(echo {} | grep -oE "[^/]+$").tar'
ChitrArpita commented 1 year ago

Hi All thank you for your help, I am able to download all the sarek docker as .tar image in a local folder. Can you please let me know now how to point at this downloaded docker images while running the sarek pipeline.

Thank you in advance :)

pontus commented 1 year ago

You load them on the target node, see e.g. https://github.com/nf-core/sarek/issues/1233#issuecomment-1719439438

ChitrArpita commented 1 year ago

I followed #1233 as well the images are loaded. the test profile is running but I believe it is loading from online. how do I specify while running the sarek to load the image from local .

I am using the following command with run.sh :

nextflow run /TEST/sarek/main.nf -c my_config.config -profile docker --input /TEST/samplessheet.csv --genome GATK.GRCh38 --igenomes_base '/TEST/references' --outdir /TEST/results/ --max_cpus 8 --save_output_as_bam TRUE --save_mapped TRUE --split_fastq 0 --tools mutect2

pontus commented 1 year ago

When using docker, there's no such concept - the images are always used locally, and docker will try to pull it if it's not available.