Closed cademirch closed 6 months ago
Thank you for your report and the minimal example.
However, I have bad or good news, depending on your view: The minimal example started working after a few tweaks. Specially: Need to download
is not produced. And there is no escaping. So touching it and working with Need\ to\ download
is an alternative or simply using the attached Snakefile.
Thanks for the quick response!
I'm not sure I follow. In the Snakefile you uploaded, the get_genome function returns the same genome file for both wildcard values, which is not what I'm trying to do.
Can you explain what you mean by touching "need to download" and working with it?
When you return "Need to download"
Snakemake interprets Need to download
as an input file to your rule. If you replace that return
line with a raise
your get_genome
function will error out and sim_download_genome
will be used instead.
EDIT: For an example of this being done in an actual workflow, see e.g. https://github.com/nikostr/read-mapping/blob/7f761dfe85bc1e532bebd9efa2c61ff2246ccbdb/workflow/rules/common.smk#L30C1-L42C10 and https://github.com/nikostr/read-mapping/blob/main/workflow/rules/trimming.smk
@nikostr thanks for helping out!
@nikostr Thanks, I'll try that. It's interesting because the "need to download" works when executed locally. So there is something different about executing on slurm/cluster. Anyway it may not matter since it makes way more sense to raise an error than return a fake file.
One thing which comes to mind: Do your admins allow internet download on compute nodes?
They do, but I'm not sure it matters in this case as even the MRE (which doesn't do any network things) fails.
Looks like switching the return "Need to Download"
to raise
in the input function fixes the issue. Thanks @nikostr and @cmeesters
I have a workflow that uses
ruleorder
to "decide" between two rules depending on the input. In my use case I want to either download a reference genome, or copy a locally provided one to the results directory. Downstream there is a checkpoint, and finally a rule that takes as input the reference genome and checkpoint output.There is no issue running this workflow locally. However, when executing on slurm, the rule downstream of the checkpoint fails. Inspecting the slurm log for that rule shows that the DAG is evaluated incorrectly. I've provided a minimal example that reproduces this behavior below.
With this example, the
do_stuff
rule fails only when the wildcardgenome
== 0. In the slurm log for this, the DAG is built, and decides to run thecopy_ref
rule. This is despite 1) all input fordo_stuff
was present at time of job submission, and 2)copy_ref
is not the right rule to run given theruleorder
.I'm curious if this could be caused by the
reason: Forced execution
in the slurm job, though I'm unsure.Appreciate any help!
Snakefile:
Snakemake error:
Slurm log for that job: