Open prs513rosewood opened 4 months ago
Thank you for this report. We definitively need to update the error message!
We had the fallback in the executor, but decided to drop it to be able to check the states in asynchronous mode with one command. A cluster without accounting db is pretty unusual. Re-introducing the fallback might not be so easy.
Is your particular cluster in an experimental stage?
Thanks for looking at this, I know this is a weird edge case. The cluster in question is somewhat artisanal.
I think the slurm cluster profile may be a workable fallback for me. And it looks like 6a197ae fixes the issue of status_of_jobs
being invalid.
I think the slurm cluster profile may be a workable fallback for me.
Perhaps. Then again, you might want to use storage plugins and/or other plugins. That would be a mess. Is there any chance your admins set up the cluster ... eh, properly?
I get an error when running a job on with a slurm instance whose accounting storage is disabled (i.e. the
sacct
command just repliesSlurm accounting storage is disabled
). Here's the stack trace :Looks like there's some error handling here: https://github.com/snakemake/snakemake-executor-plugin-slurm/blob/7e3de33ab447cd3415e53464019cce8e7361bda8/snakemake_executor_plugin_slurm/__init__.py#L221
But after the loop over attempts to get job status the rest of the code assumes no error and treats
status_of_jobs
as a valid set.The slurm profile also uses
sacct
but falls back toscontrol
if that fails, might be a solution : https://github.com/Snakemake-Profiles/slurm/blob/c44315217d1ce36493dc7dccbd013528657747f9/%7B%7Bcookiecutter.profile_name%7D%7D/slurm-status.py#L40