nextstrain / pathogen-repo-guide

4 stars 1 forks source link

profiles: rename defaults.yaml to profiles_config.yaml #24

Closed joverlee521 closed 9 months ago

joverlee521 commented 10 months ago

Description of proposed changes

Reduce confusion of the use of the profiles' config files. They are to be seen as additional profiles' config on top of the default configs. Stems from discussion with @j23414 for review of https://github.com/nextstrain/dengue/pull/15.

Note we not using config.yaml because this is used by the Snakemake profiles configuration.¹

¹ https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles

joverlee521 commented 10 months ago

Thanks for the feedback @huddlej! I fully agree the term "profiles" can be confusing. I'm hesitant to use the term "builds" because it's so overloaded, especially in the Nextstrain ecosystem.

Maybe just "customizations" is descriptive enough? Using the phylogenetic workflow as an example:

phylogenetic/
├── README.md
├── Snakefile
├── config
│   └── defaults.yaml
├── customizations
│   └── ci
│       ├── config.yaml
│       └── copy_example_data.smk
├── example_data
│   ├── metadata.tsv
│   └── sequences.fasta
└── rules
    ├── annotate_phylogeny.smk
    ├── construct_phylogeny.smk
    ├── export.smk
    └── prepare_sequences.smk
tsibley commented 10 months ago

@joverlee521 Why separate config/ and customizations/ in that suggestion? (or in the PR as-is, separate config/ and profiles/)

I'd think this (or something similar) would be just fine?

├── config
│   └── defaults.yaml
│   └── ci
│       ├── config.yaml
│       └── copy_example_data.smk
joverlee521 commented 10 months ago

Why separate config/ and customizations/ in that suggestion?

I proposed to keep them separate for two reasons:

  1. I'd like to keep config/ simple with just the default configs and not bloat it with customizations.
  2. Customizations can be a mix of configs and custom rules, so I think it may be confusing to keep them under config/.

(1) is based on experience with the seasonal flu repo, which has a pretty full config dir with sub-directories for the subtypes. It might be confusing to see the current profiles mixed in:

config
├── allflu
├── ci
├── distance_maps
├── europe
├── example
├── gisaid
├── h1n1pdm
├── h3n2
├── hutch
├── nextflu-private
├── nextflu-private-forecasts
├── nextflu-public
├── nextstrain
├── nextstrain-public
├── private.nextflu.org
├── scicore
├── upload
├── vic
└── yam

We could keep them under config/customizations/ for clearer separation. My concern for (2) still stands in this case, but maybe I'm overthinking it...

config/
├── customizations
│   ├── allflu
│   ├── ci
│   ├── europe
│   ├── example
│   ├── gisaid
│   ├── hutch
│   ├── nextflu-private
│   ├── nextflu-private-forecasts
│   ├── nextflu-public
│   ├── nextstrain
│   ├── nextstrain-public
│   ├── private.nextflu.org
│   ├── scicore
│   └── upload
├── distance_maps
├── h1n1pdm
├── h3n2
├── vic
└── yam
tsibley commented 10 months ago

Oh I see. Yeah, that makes sense.

jameshadfield commented 10 months ago

Do we have a working definition of "config" vs "customisation" vs (snakemake?) "profiles" to help me interpret this? (And maybe add "defaults" to this too.)

If we didn't use snakemake profiles for the seasonal flu would the files in profiles/ be additional directories within config/?

The linked seasonal flu repo is certainly complex in the sense of lots of directories, and perhaps necessarily so, but it'd be great to avoid introducing this complexity into repos which don't need it.

More generally, I wish we decoupled our profile configuration files ... from our build configuration files ... since they are such different concepts.

+1

joverlee521 commented 10 months ago

Do we have a working definition of "config" vs "customisation" vs (snakemake?) "profiles" to help me interpret this? (And maybe add "defaults" to this too.)

Definitely missing explicit definitions, my interpretations are:

In my definition, I am basically merging config and defaults, where anything under config/ should be considered the default configs used for a workflow. This makes it very easy for users to see what the default configs are for the workflow at first glance.

If we didn't use snakemake profiles for the seasonal flu would the files in profiles/ be additional directories within config/?

I think I would still keep it separate for the sake of keeping the default configs very simple and obvious.

joverlee521 commented 9 months ago

Closing in favor of https://github.com/nextstrain/pathogen-repo-template/pull/27