nf-core / sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
https://nf-co.re/sarek
MIT License
409 stars 417 forks source link

This tool requires AVX instruction set support by default #1464

Closed radaniba closed 7 months ago

radaniba commented 7 months ago

Description of the bug

Hi there,

I am trying to run sarek on a cluster using slurm, but keep getting the error below from a couple of tools like deepvariant and some others

I am not sure which tensorflow is it refering to within the container or in the host machine, or whether or not this error message is related to tensorflow at all. How can we set disable-avc-check within a config file top bypass this ?

I have the most recent version of tensorflow installed on all my compute nodes in the cluster.

Best,

Rad

Command used and terminal output

Command error: WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. 16:00:43.967 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so 16:00:44.063 INFO CNNScoreVariants - ------------------------------------------------------------ 16:00:44.069 INFO CNNScoreVariants - The Genome Analysis Toolkit (GATK) v4.4.0.0 16:00:44.069 INFO CNNScoreVariants - For support and documentation go to https://software.broadinstitute.org/gatk/ 16:00:44.069 INFO CNNScoreVariants - Executing as ranbiolinks@31d54fce75d5 on Linux v5.4.0-176-generic amd64 16:00:44.069 INFO CNNScoreVariants - Java runtime: OpenJDK 64-Bit Server VM v17.0.6+10-Ubuntu-0ubuntu118.04.1 16:00:44.070 INFO CNNScoreVariants - Start Date/Time: April 13, 2024 at 4:00:43 PM GMT 16:00:44.070 INFO CNNScoreVariants - ------------------------------------------------------------ 16:00:44.070 INFO CNNScoreVariants - ------------------------------------------------------------ 16:00:44.071 INFO CNNScoreVariants - HTSJDK Version: 3.0.5 16:00:44.072 INFO CNNScoreVariants - Picard Version: 3.0.0 16:00:44.072 INFO CNNScoreVariants - Built for Spark Version: 3.3.1 16:00:44.073 INFO CNNScoreVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2 16:00:44.073 INFO CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 16:00:44.073 INFO CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 16:00:44.074 INFO CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 16:00:44.074 INFO CNNScoreVariants - Deflater: IntelDeflater 16:00:44.074 INFO CNNScoreVariants - Inflater: IntelInflater 16:00:44.075 INFO CNNScoreVariants - GCS max retries/reopens: 20 16:00:44.075 INFO CNNScoreVariants - Requester pays: disabled 16:00:44.076 INFO CNNScoreVariants - Initializing engine 16:00:44.443 INFO FeatureManager - Using codec VCFCodec to read file file://074-24.haplotypecaller.vcf.gz 16:00:44.724 INFO FeatureManager - Using codec BEDCodec to read file file://Twist_Exome_RefSeq_targets_hg38_200-pad_wGenes.bed 16:00:45.527 INFO IntervalArgumentCollection - Processing 113689316 bp from intervals 16:00:45.605 INFO CNNScoreVariants - Done initializing engine 16:00:45.607 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/gatk/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_utils.so 16:00:45.629 INFO CNNScoreVariants - Done scoring variants with CNN. 16:00:45.629 INFO CNNScoreVariants - Shutting down engine [April 13, 2024 at 4:00:45 PM GMT] org.broadinstitute.hellbender.tools.walkers.vqsr.CNNScoreVariants done. Elapsed time: 0.03 minutes. Runtime.totalMemory()=260046848


A USER ERROR has occurred: This tool requires AVX instruction set support by default due to its dependency on recent versions of the TensorFlow library. If you have an older (pre-1.6) version of TensorFlow installed that does not require AVX you may attempt to re-run the tool with the disable-avx-check argument to bypass this check. Note that such configurations are not officially supported.


Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace. Using GATK jar /gatk/gatk-package-4.4.0.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx9830M -XX:-UsePerfData -jar /gatk/gatk-package-4.4.0.0-local.jar CNNScoreVariants --variant 074-24.haplotypecaller.vcf.gz --output 074-24.cnn.vcf.gz --reference Homo_sapiens_assembly38.fasta --intervals Twist_Exome_RefSeq_targets_hg38_200-pad_wGenes.bed --tmp-dir .

Relevant files

No response

System information

sarek 3.4.0

maxulysse commented 7 months ago

Hi @radaniba, can you give more details about how you ran Sarek on your system?

radaniba commented 7 months ago

hi @maxulysse

I have a slurm cluster with nexflow configured and installed on all nodes, I run an sbatch code with the following specs :

#!/bin/bash
#SBATCH --job-name=EN00004726
#SBATCH --output=/mnt/workspace/projects/EN00004726/sarek_slurm.out
#SBATCH --error=/mnt/workspace/projects/EN00004726/sarek_slurm.err
#SBATCH --export=ALL # export all environment variables to the batch job.
#SBATCH -p homelab # submit to the serial queue 
#SBATCH --nodes=1 # specify number of nodes. 
#SBATCH --ntasks-per-node=6 # specify number of processors per node 

#Update the cached version
nextflow pull nf-core/sarek

#Run the pipeline
nextflow run nf-core/sarek -r 3.4.0 \
        -profile docker \
        -resume \
        --max_cpus 5 \
        --max_memory 15.GB \
        --wes \
        --trim_fastq \
        --genome GATK.GRCh38 \
        --igenomes_base /mnt/workspace/references \
        --aligner bwa-mem \
        --input /mnt/workspace/projects/EN00004726/test.csv \
        --intervals /mnt/workspace/references/Homo_sapiens/Twist_Exome_RefSeq_targets_hg38_200-pad_wGenes.bed \
        --outdir /mnt/workspace/projects/EN00004726/output \
        --snpeff_cache /mnt/workspace/references/cache/snpeff_cache/ \
        --vep_cache /mnt/workspace/references/cache/vep_cache/ \
        --download_cache \
        --tools freebayes,haplotypecaller,strelka,manta,tiddit,snpeff,vep,bcfann

my config file is :

aws {
    client {
        anonymous = true
    }
}
process.executor = 'slurm'

Everything works fine except some tools that are built with apparently some tensorflow specs, like the one I mentioned above and also deepvariant throws the same issue

I hope this gives more clarity

radaniba commented 7 months ago

any thoughts @maxulysse ?

matthdsm commented 7 months ago

Hi @radaniba ,

Do you have any more specifics about your hardware?

You can create a custom config using a withName selector for CNNScoreVariants and add the option to disable the AVX warning. https://gatk.broadinstitute.org/hc/en-us/articles/360042914311-CNNScoreVariants#--disable-avx-check

matthdsm commented 7 months ago
process {
    withName: 'CNNSCOREVARIANTS' {
        ext.args   = { "--disable-avx-check" }
        publishDir = [
            // Otherwise it gets published
            enabled: false
        ]
    }
}
radaniba commented 7 months ago

Thank you guys I will try this indeed My specs for hardware are the following

Processor : 2.4 GHz intel_xeon RAM : 64 GB DDR4 Hard Drive : 1 TB SSD Chipset brand : Intel Card description : Integrated Brand : HP Series : Z440 Tower Server

I will try the config suggestion and let you guys know

Thanks again

Rad

radaniba commented 7 months ago

Thank you, this fixed my issue the config file I mean Many thanks for your help @matthdsm