snakemake / snakemake-executor-plugin-slurm

A Snakemake executor plugin for submitting jobs to a SLURM cluster
MIT License
18 stars 19 forks source link

SLURM logs path are unreasonably long and contains all wildcards #84

Closed xapple closed 6 months ago

xapple commented 6 months ago

Currently the rule to decide on the path to the log file of a SLURM job is to simply join all the wildcards of a given job.

https://github.com/snakemake/snakemake-executor-plugin-slurm/blob/912df8380b00742d256f42db8e7ab20d3836715b/snakemake_executor_plugin_slurm/__init__.py#L72

However, sometimes you will have a simple configuration variable such as the output path that can be fairly long.

Let's imagine a case where snakemake is launched like this:

 $ snakemake --profile myproj/snakemake --snakefile myproj/snakemake/pipeline.smk --config accessions=['99928367'] limit=2 cache_dir=/home/xapple/dev/myproj/test_data/sample_output/

Now you will see something like this for the log file:

 Job 2 has been submitted with SLURM jobid 704 (log: /home/xapple/proj_data/.snakemake/slurm_logs/rule_download/home/xapple/dev/myproj/test_data/sample_output_99928367/704.log).

There should be a mechanism to control which variables gets used, or let the user customize the log path entirely.

Currently my awkward solution is to monkey path the exectuor and change the method:

  from snakemake_executor_plugin_slurm import Executor
  from myproj.snakemake.slurm import run_job
  Executor.run_job = run_job

With something like this:

def log_path(self, job):
    # Check #
    if not hasattr(job, 'wildcards') or not job.wildcards:
        raise Exception("No wildcards present.")
    # Get the rule name #
    rule_name = job.name
    # Get the sample and dataset names #
    sample = job.wildcards["sample"]
    dataset = job.wildcards["dataset"]
    # Determine the location of the log file #
    path = f"logs/{rule_name}/{dataset}/{sample}/%j.log"
    # Return #
    return os.path.abspath(path)

def run_job(self, job):
    # Get the log path #
    slurm_logfile = log_path(self, job)
    # Get the directory #
    log_dir = os.path.dirname(slurm_logfile)
    [ Rest of the method unchanged ... ]
cmeesters commented 6 months ago

On ne peut pas ménager la chèvre et le chou.

Or with other words: We had a lengthy thread to implement the feature just like that. If there would be a PR to keep all the information and shorten the string, we would consider it.