nf-core / tools

Python package with helper tools for the nf-core community.
https://nf-co.re
MIT License
241 stars 190 forks source link

Modules meta.yml ontology #3032

Closed mirpedrol closed 1 month ago

mirpedrol commented 4 months ago

Follow up from https://github.com/nf-core/tools/pull/3028 Split into a different PR to make it easier to review once #3028 is merged I closed #3028 so this is the only PR to review now

Continuation of https://github.com/nf-core/tools/pull/2789

This PR adds an automated way of generating the right format meta.yml for modules.

 input: 
   - - meta: 
           type: map 
           description: | 
             Groovy Map containing sample information 
             e.g. [ id:'test', single_end:false ] 
       - scaffold: 
            type: file 
            description: Fasta file containing scaffold 
            pattern: "*.{fasta,fa}" 
   - - fasta: 
           type: file 
           description: FASTA reference file 
           pattern: "*.{fasta,fa}" 

Note that the structure proposed in https://github.com/nf-core/modules/issues/4983#issuecomment-1963572056 is not possible if we want to automate the creation of this file, as comments are ignored when reading a yaml file with Python.

Example of outputs formatting:

output:
  - versions:
    - "versions.yml":
        type: file
        description: File containing software versions
        pattern: "versions.yml"
  - bam:
    - meta:
        type: map
        description: Groovy Map containing sample information

    - "*.bam":
        type: file
        description: Sorted BAM/CRAM/SAM file
        pattern: "*.{bam,cram,sam}"

In this PR we also add an option --update-meta-yml --fix to fix existing files automatically. To be changed to --fix as suggested in https://github.com/nf-core/tools/pull/2789#issuecomment-2177863461

Pytests are also missing for this functionality. A test was added for the command nf-core modules lint --fix. ⚠️ This test will fail until the JSON schema is updated (https://github.com/nf-core/modules/pull/5837)

Together with this PR, there are other actions which must happen at the same time:

This PR also adds a tool identifier to the modules meta.yml. It queries bio.tools to obtain the bio.tools ID. It adds the edam ontologies for file inputs and outputs to the meta.yml template.

Note that ontologies are not automatised, even though this can sometimes be obtained from bio.tools. Currently, inputs and outputs are not automatically obtained when first creating the module. We should consider if it is required to automatise this. One option is to update the ontologies when updating the meta.yml with --update-meta-yml

POC in modules: https://github.com/nf-core/modules/pull/5867

Ontologies can be added manually, and we have linting for them, but we leave implementing the tooling for a later stage: https://github.com/nf-core/tools/issues/3027

maxulysse commented 1 month ago

some conflicts :fearful:

mirpedrol commented 1 month ago

Confirmed that we are using the path to the cloned remote to obtain the JSON schema, not the local repo (see https://github.com/nf-core/tools/blob/dev/nf_core/modules/lint/meta_yml.py#L70)

mirpedrol commented 1 month ago

Merging this to continue with the bulk modules update. The reason for the failing test is that we use the JSON schema from master modules branch, and we are updating everything in the batch_update_staging modules branch. We should be ready to merge batch_update_staging to master once subworkflows are updated too.