nf-core / tools

Python package with helper tools for the nf-core community.
https://nf-co.re
MIT License
232 stars 187 forks source link

Move igenomes config into pipeline code #3131

Open bentsherman opened 3 weeks ago

bentsherman commented 3 weeks ago

The params.genomes is a map that is conditionally included, which is a very clever hack that will not be supported by the new config parser 😅

Instead, I suggest refactoring the igenomes.config into a function that returns a giant map:

def getGenomesMap(base_url) {
  [
    'GRCh37': [
      fasta: "${base_url}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa",
      // ...
    ],
    // ...
  ]
}

You can then include this function into your workflow and use it however you want:

include { getGenomesMap } from '...'

workflow {
  genomes = !params.igenomes_ignore ? getGenomesMap(params.igenomes_base) : [:]
}
bentsherman commented 3 weeks ago

@ewels mentioned that users like to modify this genomes config. In that case, it should be refactored into an index file (e.g. JSON, YAML). Then you can load it in the config, although it still might be easier to have a param for the file name (e.g. params.igenomes_index) and load it in the pipeline code:

def getGenomesIndex(filename) {
  new groovy.json.JsonSlurper().parseText(file(filename).text)
}
ewels commented 3 weeks ago

My summary is that we have two problems here:

  1. The includeConfig is within an if statement in the config. This won't work with the new config parser, and needs resolving ASAP.
  2. The config itself uses nested parameters, support for which may be dropped in the slightly longer term.

So the most urgent thing is a short term fix for (1). Instead of this:

// Load igenomes.config if required
if (!params.igenomes_ignore) {
    includeConfig 'conf/igenomes.config'
} else {
    params.genomes = [:]
}

Suggestion is to use a ternary expression, which should work:

// Load igenomes.config if required
includeConfig !params.igenomes_ignore ? 'conf/igenomes.config' : 'conf/igenomes_ignored.config'

Where a new igenomes_ignored.config file would simply contain:

params.genomes = [:]

Can maybe think of more elegant syntax, but that's the gist of it.


Regarding (2) in the longer term - it'd be nice to rewrite how all of the iGenomes configuration works. This will break how existing user's config files work, but I think it's ok if we're overhauling the entire references system anyway. So hopefully we can incorporate this syntax change along with the new references back end.

ewels commented 6 days ago

See https://github.com/nf-core/tools/issues/3132#issuecomment-2307857283 for another issue about includeConfig, essentially the same as above.