Closed brantfaircloth closed 6 days ago
That's a new one to me. Right idea opening an issue there too. Curious if you're able to run any workflow using the plugin? Here is a simple one to test:
Snakefile:
rule all:
input: expand("test_output/hi_{i}.txt", i=range(4))
rule a:
output:
"test_output/hi_{i}.txt"
shell:
"""
echo {wildcards.i} > {output}
"""
Other info that would be helpful is how you ran/submitted the actual Snakemake run itself?
It's a little of a long story, but working with HPC staff to find the way they prefer jobs are submitted. At the moment [on their advice as we test] submitting on the head node and monitoring. The call to run is/was:
snakemake --verbose -s snpArcher/workflow/Snakefile -d projects/anna-test --workflow-profile snpArcher/profiles/slurm
As I posted in the issue for the plugin, the offending sbatch
call is:
sbatch --job-name 8cf30205-818c-4a01-8c15-ecf5ebe02650 --output /ddnA/work/brant/snpArcher-test/projects/anna-test/.snakemake/slurm_logs/rule_download_reference/GCA_019023105.1_LSU_DiBr_2.0_genomic.fna/%j.log --export=ALL --comment rule_download_reference_wildcards_GCA_019023105.1_LSU_DiBr_2.0_genomic.fna -A 'hpc_deepbayou' -p single -t 720 --mem 4000 --ntasks=1 --cpus-per-task=1 -D /ddnA/work/brant/snpArcher-test/projects/anna-test --wrap="/project/brant/db-home/miniconda/envs/snparcher/bin/python3.11 -m snakemake --snakefile /ddnA/work/brant/snpArcher-test/snpArcher/workflow/Snakefile --target-jobs 'download_reference:refGenome=GCA_019023105.1_LSU_DiBr_2.0_genomic.fna' --allowed-rules 'download_reference' --cores all --attempt 1 --force-use-threads --resources 'mem_mb=4000' 'mem_mib=3815' 'disk_mb=1000' 'disk_mib=954' 'mem_mb_reduced=3600' --wait-for-files '/ddnA/work/brant/snpArcher-test/projects/anna-test/.snakemake/tmp.x2lj0io7' '/home/brant/work/snpArcher-test/projects/anna-test/reference' '/ddnA/work/brant/snpArcher-test/projects/anna-test/.snakemake/conda/8ecf006a88f493174cca4b84629295d3_' --force --target-files-omit-workdir-adjustment --keep-storage-local-copies --max-inventory-time 0 --nocolor --notemp --no-hooks --nolock --ignore-incomplete --verbose --rerun-triggers input params mtime code software-env --deployment-method conda --conda-frontend mamba --conda-base-path /project/brant/db-home/miniconda --apptainer-prefix /work/brant/.singularity/ --shared-fs-usage persistence software-deployment input-output sources source-cache storage-local-copies --wrapper-prefix https://github.com/snakemake/snakemake-wrappers/raw/ --latency-wait 100 --scheduler ilp --local-storage-prefix .snakemake/storage --scheduler-solver-path /project/brant/db-home/miniconda/envs/snparcher/bin --set-threads base64//ZG93bmxvYWRfcmVmZXJlbmNlPTE= base64//aW5kZXhfcmVmZXJlbmNlPTE= base64//Zm9ybWF0X2ludGVydmFsX2xpc3Q9MQ== base64//Y3JlYXRlX2d2Y2ZfaW50ZXJ2YWxzPTE= base64//Y3JlYXRlX2RiX2ludGVydmFscz0x base64//cGljYXJkX2ludGVydmFscz0x base64//Z2VubWFwPTEy base64//bWFwcGFiaWxpdHlfYmVkPTE= base64//Z2V0X2Zhc3RxX3BlPTEy base64//ZmFzdHA9MTI= base64//YndhX21hcD0xMg== base64//ZGVkdXA9MTI= base64//bWVyZ2VfYmFtcz0x base64//YmFtMmd2Y2Y9MQ== base64//Y29uY2F0X2d2Y2ZzPTE= base64//YmNmdG9vbHNfbm9ybT0x base64//Y3JlYXRlX2RiX21hcGZpbGU9MQ== base64//Z3ZjZjJEQj0x base64//REIydmNmPTE= base64//ZmlsdGVyVmNmcz0x base64//c29ydF9nYXRoZXJWY2ZzPTE= base64//Y29tcHV0ZV9kND0x base64//Y3JlYXRlX2Nvdl9iZWQ9MQ== base64//bWVyZ2VfZDQ9MQ== base64//YmFtX3N1bXN0YXRzPTE= base64//Y29sbGVjdF9jb3ZzdGF0cz0x base64//Y29sbGVjdF9mYXN0cF9zdGF0cz0x base64//Y29sbGVjdF9zdW1zdGF0cz0x base64//cWNfYWRtaXh0dXJlPTE= base64//cWNfY2hlY2tfZmFpPTE= base64//cWNfZ2VuZXJhdGVfY29vcmRzX2ZpbGU9MQ== base64//cWNfcGxpbms9MQ== base64//cWNfcWNfcGxvdHM9MQ== base64//cWNfc2V0dXBfYWRtaXh0dXJlPTE= base64//cWNfc3Vic2FtcGxlX3NucHM9MQ== base64//cWNfdmNmdG9vbHNfaW5kaXZpZHVhbHM9MQ== base64//bWtfZGVnZW5vdGF0ZT0x base64//bWtfcHJlcF9nZW5vbWU9MQ== base64//bWtfc3BsaXRfc2FtcGxlcz0x base64//cG9zdHByb2Nlc3Nfc3RyaWN0X2ZpbHRlcj0x base64//cG9zdHByb2Nlc3NfYmFzaWNfZmlsdGVyPTE= base64//cG9zdHByb2Nlc3NfZmlsdGVyX2luZGl2aWR1YWxzPTE= base64//cG9zdHByb2Nlc3Nfc3Vic2V0X2luZGVscz0x base64//cG9zdHByb2Nlc3Nfc3Vic2V0X3NucHM9MQ== base64//cG9zdHByb2Nlc3NfdXBkYXRlX2JlZD0x base64//dHJhY2todWJfYmNmdG9vbHNfZGVwdGg9MQ== base64//dHJhY2todWJfYmVkZ3JhcGhfdG9fYmlnd2lnPTE= base64//dHJhY2todWJfY2FsY19waT0x base64//dHJhY2todWJfY2FsY19zbnBkZW49MQ== base64//dHJhY2todWJfY2FsY190YWppbWE9MQ== base64//dHJhY2todWJfY2hyb21fc2l6ZXM9MQ== base64//dHJhY2todWJfY29udmVydF90b19iZWRncmFwaD0x base64//dHJhY2todWJfc3RyaXBfdmNmPTE= base64//dHJhY2todWJfdmNmdG9vbHNfZnJlcT0x base64//dHJhY2todWJfd3JpdGVfaHViX2ZpbGVzPTE= base64//c2VudGllb25fbWFwPTE= base64//c2VudGllb25fZGVkdXA9MQ== base64//c2VudGllb25faGFwbG90eXBlcj0x base64//c2VudGllb25fY29tYmluZV9ndmNmPTE= base64//c2VudGllb25fYmFtX3N0YXRzPTE= --default-resources base64//bWVtX21iPWF0dGVtcHQgKiA0MDAw base64//ZGlza19tYj1tYXgoMippbnB1dC5zaXplX21iLCAxMDAwKQ== base64//dG1wZGlyPXN5c3RlbV90bXBkaXI= base64//bWVtX21iX3JlZHVjZWQ9KGF0dGVtcHQgKiA0MDAwKSAqIDAuOQ== base64//c2x1cm1fcGFydGl0aW9uPXNpbmdsZQ== base64//c2x1cm1fYWNjb3VudD1ocGNfZGVlcGJheW91 base64//cnVudGltZT03MjA= --executor slurm-jobstep --jobs 1 --mode remote"
and the slurm profile I'm applying is pretty close to stock - see below.
executor: slurm
use-conda: True
jobs: 15 # Have up to N jobs submitted at any given time
latency-wait: 100 # Wait N seconds for output files due to latency
retries: 0 # Retry jobs N times.
# These resources will be applied to all rules. Can be overriden on a per-rule basis below.
default-resources:
mem_mb: attempt * 4000
mem_mb_reduced: (attempt * 4000) * 0.9 # Mem allocated to java for GATK rules (tries to prevent OOM errors)
slurm_partition: "single"
slurm_account: "hpc_deepbayou" # Same as sbatch -A. Not all clusters use this.
runtime: 720 # In minutes
# Control number of threads each rule will use.
set-threads:
# Reference Genome Processing. Does NOT use more than 1 thread.
download_reference: 1
index_reference: 1
# Interval Generation. Does NOT use more than 1 thread.
format_interval_list: 1
create_gvcf_intervals: 1
create_db_intervals: 1
picard_intervals: 1
# Mappability
genmap: 12 # Can use more than 1 thread
mappability_bed: 1 # Does NOT use more than 1 thread
# Fastq Processing. Can use more than 1 thread.
get_fastq_pe: 12
fastp: 12
# Alignment. Can use more than 1 thread, except merge_bams.
bwa_map: 12
dedup: 12
merge_bams: 1 # Does NOT use more than 1 thread.
# GVCF
bam2gvcf: 1 # Should be run with no more than 2 threads.
concat_gvcfs: 1 # Does NOT use more than 1 thread.
bcftools_norm: 1 # Does NOT use more than 1 thread.
create_db_mapfile: 1 # Does NOT use more than 1 thread.
gvcf2DB: 1 # Should be run with no more than 2 threads.
# VCF
DB2vcf: 1 # Should be run with no more than 2 threads.
filterVcfs: 1 # Should be run with no more than 2 threads.
sort_gatherVcfs: 1 # Should be run with no more than 2 threads.
# Callable Bed
compute_d4: 1 # Can use more than 1 thread
create_cov_bed: 1 # Does NOT use more than 1 thread.
merge_d4: 1 # Does NOT use more than 1 thread.
# Summary Stats Does NOT use more than 1 thread.
bam_sumstats: 1
collect_covstats: 1
collect_fastp_stats: 1
collect_sumstats: 1
# QC Module Does NOT use more than 1 thread.
qc_admixture: 1
qc_check_fai: 1
qc_generate_coords_file: 1
qc_plink: 1
qc_qc_plots: 1
qc_setup_admixture: 1
qc_subsample_snps: 1
qc_vcftools_individuals: 1
# MK Module Does NOT use more than 1 thread.
mk_degenotate: 1
mk_prep_genome: 1
mk_split_samples: 1
# Postprocess Module Does NOT use more than 1 thread.
postprocess_strict_filter: 1
postprocess_basic_filter: 1
postprocess_filter_individuals: 1
postprocess_subset_indels: 1
postprocess_subset_snps: 1
postprocess_update_bed: 1
# Trackhub Module Does NOT use more than 1 thread.
trackhub_bcftools_depth: 1
trackhub_bedgraph_to_bigwig: 1
trackhub_calc_pi: 1
trackhub_calc_snpden: 1
trackhub_calc_tajima: 1
trackhub_chrom_sizes: 1
trackhub_convert_to_bedgraph: 1
trackhub_strip_vcf: 1
trackhub_vcftools_freq: 1
trackhub_write_hub_files: 1
# Sentieon Tools. Can use more than 1 thread, except sentieon_bam_stats.
sentieon_map: 1
sentieon_dedup: 1
sentieon_haplotyper: 1
sentieon_combine_gvcf: 1
sentieon_bam_stats: 1 # Does NOT use more than 1 thread.
-b
So, we were able to figure it out - basically, a site optimization for the sbatch
command was stripping the quotes from the options passed to sbatch --wrap
. This, in turn, caused the commands passed to --wrap
to be interpreted as options/arguments to sbatch
- and the first of these was the -m
option to import the snakemake module. In this instance, -m
was interpreted as sbatch -m
, which has to do with distribution method for processes to nodes
... causing the error.
Hi y'all,
Working to get snpArcher running on our HPC and have bumped up against a problem with batch job submission through
snakemake-executor-plugin-slurm
. The issue is thatsbatch
commands fail with a somewhat cryptic error that I can't track down:I've submitted an issue upstream to the
snakemake-executor-plugin-slurm
crew to see if they have any suggestions and will post back if I get it sorted.Thanks, -brant