pepkit / looper

A job submitter for Portable Encapsulated Projects
http://looper.databio.org
BSD 2-Clause "Simplified" License
20 stars 7 forks source link

Unclear how to configure pipestat namespace name with looper for file backend #460

Closed nsheff closed 1 week ago

nsheff commented 4 months ago

How do you configure the pipestat pipeline name when passing it via looper?

Here's my looper config;

pep_config: demo_fasta.yaml
output_dir: results
pipeline_interfaces:
  sample: ../pipeline/pipeline_interface.yaml
pipestat:
  project_name: demo
  results_file_path: results/pipeline_stats.yaml

The problem is, before any pipelines are even run, looper is "helpfully" creating the results/pipeline_stats.yaml file... with this content:

default_pipeline_name:
  project: {}
  sample: {}

Then, when the first job runs, it tries to report to that file with a correct namespace, add_to_seqcol, and that gives an error:

  File "/home/nsheff/.local/lib/python3.11/site-packages/pipestat/backends/file_backend/filebackend.py", line 729, in _load_results_file
    raise PipestatError(
pipestat.exceptions.PipestatError: '/home/nsheff/code/seqcolapi/analysis/demo/results/pipeline_stats.yaml' is already in use for 1 namespaces: default_pipeline_name and multi_pipelines = False.

Because 'default_pipeline_name' is already being recorded in that file, and then, understandably, when I try to report a different namespace, it's unhappy. So, the configuration is bad.

I tried adding pipestat.pipeline_name: add_to_seqcol but it does not pick it up.

donaldcampbelljr commented 4 months ago

It appears as though it is ignoring the pipeline_name given in the interface and using the one in the pipestat output schema.

nsheff commented 4 months ago

yes, adding output_schema: output_schema.yaml to my pipeline_interface was what it was looking for.

donaldcampbelljr commented 1 week ago

I believe this is now solved with the recent releases of Looper 1.8.1 and Pipestat 0.9.3. With the refactoring, Looper prioritizes the pipeline_name in the pipeline interface which can be passed to pipestat via the generated pipestat config file or as a parameter to pipestat. Both of these methods now take priority over the pipeline_name found in the output schema. Looper will now warn the user if there is a pipeline_name mismatch and default to the one provided in the pipeline interface. The docs have also been recently updated as part of those recent releases.