nf-core / tools

Python package with helper tools for the nf-core community.
https://nf-co.re
MIT License
238 stars 188 forks source link

schema build incorrectly reports config mismatch with $projectDir #2832

Open jashapiro opened 7 months ago

jashapiro commented 7 months ago

Description of the bug

When a config variable uses ${projectDir} in its definition, the nf-core schema build tool reports that the pipeline config does not match the schema and requests adding absolute paths into the schema file.

Validation with nf-core schema lint nextflow_schema.json proceeds as expected and passes.

Command used and terminal output

$ nf-core schema build

                                          ,--./,-.
          ___     __   __   __   ___     /,-._.--~\
    |\ | |__  __ /  ` /  \ |__) |__         }  {
    | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                          `._,._,'

    nf-core/tools version 2.13.1 - https://nf-co.re

INFO     [✓] Default parameters match schema validation
INFO     [✓] Pipeline schema looks valid (found 109 params)
✨ Default for 'params.ribo_database_manifest' in the pipeline config does not match schema.
(schema: '${projectDir}/assets/rrna-db-defaults.txt' | config:
'/Users/josh/repo/rnaseq/assets/rrna-db-defaults.txt'). Update pipeline schema? [y/n]:

System information

Nextflow version: 23.10.1 Hardware: Mac M1 desktop Executor: local OS: macOS Version of nf-core/tools: 2.13.1 Python version: 3.12.2

mirpedrol commented 7 months ago

Hello @jashapiro

Default values from a JSON schema containing variables are treated literally in some cases, for example with the Seqera Platform. The best is to omit the default from the schema, or avoid using variables. You can keep the default in the config file.

If you want to keep the default in the schema, you can avoid tests failing with nf-core lint by ignoring these params with the .nf-core.yml file:

lint:
  nextflow_config:
    - config_defaults:
      - params.<param_name>
jashapiro commented 7 months ago

Thank you for your response. I wanted to note that the example that I used above was from the nf-core/rnaseq pipeline:

https://github.com/nf-core/rnaseq/blob/b89fac32650aacc86fcda9ee77e00612a1d77066/nextflow.config#L57

A number of other nf-core pipelines also use this pattern, which is quite useful!

The behavior I am specifically concerned with is the inconsistency between nf-core schema lint and nf-core schema build, where one allows the variable without complaint and the other does not.

awgymer commented 7 months ago

As far as I was concerned when I implemented this schema build is explicitly not linting, it's a tool for refining and updating your config and I had seen several people get caught out by this, either through updating their pipeline config but not their schema, or by updating the schema and expecting it to be propagated into the config.

It might be occasionally inconvenient to have to go through the conflicting defaults when they are expected but the tooling isn't really smart enough to know when the divergence occurred. I suppose the lint-skipping config could be used to skip checking/warning about these params, but the downside then is, if you change a default which is being skipped and it diverges in some unintended way from the value in the schema, you will not be notified by linting or when you go to update the schema.