google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD 3-Clause "New" or "Revised" License
3.18k stars 721 forks source link

Running deepvariant within Nextflow DSL2 #883

Open dbhayal9 opened 3 weeks ago

dbhayal9 commented 3 weeks ago

!/usr/bin/env nextflow

nextflow.enable.dsl=2

params.data_dir = 'output/4.markDuplicate' params.reference = 'reference/Homo_sapiens_assembly38.fasta' params.bed_file = 'reference/hg38_exome.bed' params.outdir = 'output/5.snvS' params.cpus = 16 // Number of CPUs to use

workflow {

// Define channels for input data
Channel
    .fromPath("${PWD}${params.data_dir}/*_sorted_md.bam")
    .map { file ->
        def sample_id = file.baseName.replace('_sorted_md', '')
        return [sample_id, file]
    }
    .set { read_pairs }

// Process steps
/// Germline variant calling
deepvar(read_pairs, params.reference, params.bed_file)

} process deepvar { tag "Germline Variant on ${sample_id}" publishDir "${params.outdir}/6.variantM", mode: 'copy' cpus 16

input:
tuple val(sample_id), path(read_files)
val(params.reference)
val(params.bed_file)

output:
tuple val(sample_id), path("${sample_id}_raw.vcf.gz"), path("${sample_id}_raw.gvcf.gz"), emit: raw_vcfs

script:
"""
sudo docker run \
    -v "${PWD}":"${PWD}" \
    google/deepvariant:1.6.1 \
    /opt/deepvariant/bin/run_deepvariant \
    --model_type WES \
    --ref ${PWD}/${params.reference} \
    --reads ${PWD}/output/4.markDuplicate/${sample_id}_sorted_md.bam \
    --regions ${PWD}/${params.bed_file} \
    --output_vcf ${PWD}/${params.outdir}/${sample_id}_raw.vcf.gz \
    --output_gvcf ${PWD}/${sample_id}_raw.gvcf.gz \
    --num_shards ${task.cpus}
    --intermediate_results_dir ${PWD}/tmp > deepvariant_log.txt 2>&1

"""

}

############# Error ###################

N E X T F L O W ~ version 24.04.4

Launching dip.nf [deadly_pike] DSL2 - revision: e075b1fba0

executor > local (2) [a6/9c6b79] process > deepvar (Germline Variant on SRR26512959) [ 0%] 0 of 2 ERROR ~ Error executing process > 'deepvar (Germline Variant on SRR26512958)'

executor > local (2) [a6/9c6b79] process > deepvar (Germline Variant on SRR26512959) [100%] 1 of 1, failed: 1 ERROR ~ Error executing process > 'deepvar (Germline Variant on SRR26512958)'

Caused by: Process deepvar (Germline Variant on SRR26512958) terminated with an error exit status (127)

Command executed:

sudo docker run -v "/home/ubuntu/dd/nextflow2":"/home/ubuntu/dd/nextflow2" google/deepvariant:1.6.1 /opt/deepvariant/bin/run_deepvariant --model_type WES --ref /home/ubuntu/dd/nextflow2/reference/Homo_sapiens_assembly38.fasta --reads /home/ubuntu/dd/nextflow2/output/4.markDuplicate/SRR26512958_sorted_md.bam --regions /home/ubuntu/dd/nextflow2/reference/hg38_exome.bed --output_vcf /home/ubuntu/dd/nextflow2/output/5.snvS/SRR26512958_raw.vcf.gz --output_gvcf /home/ubuntu/dd/nextflow2/SRR26512958_raw.gvcf.gz --num_shards 16 --intermediate_results_dir /home/ubuntu/dd/nextflow2/tmp > deepvariant_log.txt 2>&1

Command exit status: 127

Command output: (empty)

Command error: docker: Error response from daemon: open /var/lib/docker/overlay2/fe3663cd03e849890d83be14603f217249f3f43f9585b554df599d0318909f21/.tmp-committed2046174062: no such file or directory. See 'docker run --help'.

Work dir: /home/ubuntu/dd/nextflow2/work/ea/9ecd306270fe3f00d9b73f8261fe89

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

-- Check '.nextflow.log' file for details

akolesnikov commented 2 weeks ago

Could you please paste the content of your deepvariant_log.txt?

dbhayal9 commented 2 weeks ago

Hi @akolesnikov , thank you for your response here i attached code, terminal output and log file. code is running but neither output generating or error throwing just running. please see below code and log file

code

!/usr/bin/env nextflow

nextflow.enable.dsl=2 params.outdir = '/home/deepak/integration/resu1' params.data_dir = '/home/deepak/integration/resu1/4.markDupliM' params.refhg38 = '/home/deepak/integration/hg381_22XYM' params.bed = '/home/deepak/integration'

workflow { // Define channels for input data Channel .fromPath("${params.data_dir}/*_sorted_md.bam") .map { file -> def sample_id = file.baseName.replace('_sorted_md', '') return [sample_id, file] } .set { read_pairs } /// Step 1. DeepVariant DeepVariant(read_pairs, params.refhg38, params.bed) }

process DeepVariant { tag "deepavar on ${sample_id}" publishDir "${params.outdir}/5.finaleepvar", mode: 'copy' cpus 4 //BIN_VERSION 1.6.1

input:
tuple val(sample_id), path(read_files)
val(params.refhg38)
val(params.bed)

output:
//tuple val(sample_id), path("${sample_id}_rawd.vcf.gz"), path("${sample_id}_rawd.gvcf.gz"), emit: raw_vcfs
tuple val(sample_id), path("${sample_id}_rawd.vcf.gz"), emit: raw_vcfs

script:
"""
docker run \
    -v "${params.data_dir}":/opt/bam -v "${params.refhg38}":/opt/refhg38 -v "${params.bed}":/opt/bed \
    google/deepvariant:latest \
    /opt/deepvariant/bin/run_deepvariant \
    --model_type WES \
    --ref /opt/refhg38/Homo_sapiens_assembly38cleaned.fasta \
    --reads /opt/bam/${read_files} \
    --regions /opt/bed/hg38_exomeY.bed \
    --output_vcf /opt/bam/${sample_id}_rawd.vcf.gz \
    --num_shards ${task.cpus}
"""

}

######## code ################

terminal: (base) deepak@ubuntu22:~/integration$ nextflow run final_deepvarian.nf

N E X T F L O W ~ version 24.04.4

Launching final_deepvarian.nf [hungry_stonebraker] DSL2 - revision: 4dab17f4f2

executor > local (1) [dd/64034b] DeepVariant (deepavar on SRR26512958) [ 0%] 0 of 2

log file attached nextflow.log

akolesnikov commented 2 weeks ago

In order to debug the issue we need to see the error from DeepVariant. I see that you redirected stdout to deepvariant_log.txt. What is the content of this file?

dbhayal9 commented 2 weeks ago

Hi @akolesnikov , Please find log files nextflow.log stderr.log

pichuan commented 1 hour ago

Hi @dbhayal9 , from your log, it seems like DeepVariant finished running, and generated an output VCF here: /home/ubuntu/rgenx/nextflow2/output/5.snvS/SRR26512958_raw.vcf.gz

Can you try looking for the file? Something like

ls /home/ubuntu/rgenx/nextflow2/output/5.snvS/SRR26512958_raw.vcf.gz

and if it exist, you can zcat it to see if it has the expected content?