Closed brantfaircloth closed 5 months ago
Hi,
edit: I noticed that you submitted on the head node, like intended. So, could you please run
$ sbatch test.sh
with test.sh
being
#!/bin/bash
#SBATCH --job-name 8cf30205-818c-4a01-8c15-ecf5ebe02650
#SBATCH --output /ddnA/work/brant/snpArcher-test/projects/anna-test/.snakemake/slurm_logs/rule_download_reference/GCA_019023105.1_LSU_DiBr_2.0_genomic.fna/%j.log
#SBATCH --export=ALL
#SBATCH --comment rule_download_reference_wildcards_GCA_019023105.1_LSU_DiBr_2.0_genomic.fna
#SBATCH -A 'hpc_deepbayou' -p single -t 720
#SBATCH --mem 4000
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
srun "Hello from $(hostname)"
please? You will notice: This is the exact submission command you implicitly used by running Snakemake.
I am curious to find whether it throws the same error.
Hi Christian,
Thanks for your help. I forgot to add, above, that I'm running on the head node at the moment because that's how our HPC staff preferred to test workflow engines like snakemake.
The command seems to run - the output to the specified log file:
srun: lua: Submitted job 77901
slurmstepd: error: execve(): Hello from db002: No such file or directory
srun: error: db002: task 0: Exited with exit code 2
-b
After playing around with this a bit, it seems that, for whatever reason, the python module import command -m
enclosed in the --wrap="{stuff}"
argument is being interpreted as the sbatch -m
option, which is the shortcut version of the sbatch --distribution
option - and causing the error.
arrgh, of course I meant srun echo "Hello ..."
, but we can ignore this error.
OK, it works. Just, what is srun: lua: Submitted job ...
? You did start with sbatch
, did you?
Anyway, --distribution
is a SLURM specific speciality to use for MPI programs with non-ordinary rank topology. I have no idea, where this error of yours gets provoked. --wrap
was chosen because otherwise we need tricky, error-prone ad hoc jobscripts.
My latest version of SLURM is 23.02.7 - I will ask some contacts to carry out tests. This error is highly disturbing ...
PS How did you get to your conclusion?
haha - yeah, I thought about prettying it up, but the output indicated it worked either way. I did submit w/ sbatch - I'm not sure about the lua aspect of sbatch - that's something that's showed up recently with upgrades to our queuing system. I think it relates to the lua job submit plugin (https://slurm.schedmd.com/job_submit_plugins.html).
That said, I wonder if there is a bug in that submit plugin that is altering the way that "--wrap" should be functioning. I'll see if it's possible to turn off that plugin for a test.
As for the -m
, a simple test like this:
sbatch --job-name c1bc406d-e80f-444e-bb1a-91364f7e84a3 --output /ddnA/work/brant/snpArcher-test/projects/anna-test/.snakemake/slurm_logs/rule_download_reference/GCA_019023105.1_LSU_DiBr_2.0_genomic.fna/%j.log --export=ALL --comment rule_download_reference_wildcards_GCA_019023105.1_LSU_DiBr_2.0_genomic.fna -A 'hpc_deepbayou' -p single -t 720 --mem 4000 --ntasks=1 --cpus-per-task=1 -D /ddnA/work/brant/snpArcher-test/projects/anna-test \
--wrap="/project/brant/db-home/miniconda/envs/snparcher/bin/python3.11"
submits and runs without error (although it doesn't do anything):
sbatch: Job estimates 12.00 SUs for -p single --nodes=1 --ntasks=1 --cpus-per-task=1
sbatch: lua: Submitted job 77907
Submitted batch job 77907
while:
sbatch --job-name c1bc406d-e80f-444e-bb1a-91364f7e84a3 --output /ddnA/work/brant/snpArcher-test/projects/anna-test/.snakemake/slurm_logs/rule_download_reference/GCA_019023105.1_LSU_DiBr_2.0_genomic.fna/%j.log --export=ALL --comment rule_download_reference_wildcards_GCA_019023105.1_LSU_DiBr_2.0_genomic.fna -A 'hpc_deepbayou' -p single -t 720 --mem 4000 --ntasks=1 --cpus-per-task=1 -D /ddnA/work/brant/snpArcher-test/projects/anna-test \
--wrap="/project/brant/db-home/miniconda/envs/snparcher/bin/python3.11 -m snakemake"
produces the error that I've been seeing:
sbatch: error: Invalid --distribution specification
error
Thanks for your detailed feedback: I will test a subtle change. (Doubt it will work.)
I'm also chatting w/ our sysadmin who works on slurm to see if he has any suggestions/fixes.
Hi Christian,
I think that we may have found the culprit - there was a site customization to the sbatch
command that is/was stripping the quotes from around the items passed to --wrap
. That caused sbatch
to interpret the string as sbatch
options rather than as a command to be wrapped in a shell script. Going to do some testing to confirm.
Yep, that did the trick. Thanks for your help and apologies for the bother! At the very least, if someone else hits the same issue, this could be a fix.
-b
Oh? That was rather fast. Sorry, for me, it was dinner time and today the kids had little for lunch, so no more work for today.
One more thing, though: There is no need to apologize, bugs do happen, and sometimes it's hard to find the real reason. I am, however, rather interested in learning the source and the remedy. Other than that, I am just glad it is working for you, now.
Running perfectly now. Thanks again and have a good evening,
-b
Good afternoon,
I'm using the
snakemake-executor-plugin-slurm
with a snakemake workflow meant for calling SNPs in genomic data (https://github.com/harvardinformatics/snpArcher). When snakemake attempts to submit jobs, those submissions are failing with the error message:I've pored over the actual command being submitted (below) but cannot find why this particular error is being thrown - and a pointer to track this down would be super helpful. I've searched existing issues here, on the snakemake issues page, and also on the snparcher issues page, but I haven't tracked down anything similar or anything helpful (yet).
The version of slurm is
23.11.6
. Happy to provide any additional information, as well. I realize this is less likely a bug and more likely something to do with how our university HPC is setup.Thanks much, -brant