sanger-tol / blobtoolkit

Nextflow pipeline for BlobToolKit for Sanger ToL production suite
https://pipelines.tol.sanger.ac.uk/blobtoolkit
MIT License
11 stars 1 forks source link

Support draft assemblies #97

Closed muffato closed 1 month ago

muffato commented 4 months ago

On this branch, there is no input Yaml file. The only mandatory parameters are:

--accession is optional and is used to pull assembly information from ENA into the blobDir's meta.json.

I haven't restructured the pipeline much. All the blobtools command at the end still require a yaml file. My solution is to add a script at the beginning of the pipeline that generates the minimal yaml file required (as per https://github.com/sanger-tol/blobtoolkit/issues/77#issuecomment-1936286274). It still allows clearly getting some parameters in the input-check sub-workflow and making the busco sub-workflow more focused on running buco + blastp.

Busco lineages are inferred from the taxonomy directly here. Like in the genome-note pipeline, I've moved away from using GoaT as GoaT is just a proxy to the NCBI taxonomy. This way, I can keep control of both the version of Busco and the list of lineages in the same place. I've also introduced the --busco_lineages parameter to allow precisely selecting the lineages that are used, rather than the taxonomy-based defaults.

Still a draft for now as I want to review /nfs/team135/yy5/btk_config/taxonomiser_v2.py and maybe incorporate some elements of it.

PR checklist

github-actions[bot] commented 4 months ago

nf-core lint overall result: Passed :white_check_mark:

Posted for pipeline commit 8c70c77

+| ✅ 134 tests passed       |+
#| ❔  24 tests were ignored |#
### :grey_question: Tests ignored: * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File is ignored: `CODE_OF_CONDUCT.md` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File is ignored: `assets/nf-core-blobtoolkit_logo_light.png` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File is ignored: `docs/images/nf-core-blobtoolkit_logo_light.png` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File is ignored: `docs/images/nf-core-blobtoolkit_logo_dark.png` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File is ignored: `.github/ISSUE_TEMPLATE/config.yml` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File is ignored: `.github/workflows/awstest.yml` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File is ignored: `.github/workflows/awsfulltest.yml` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File is ignored: `conf/igenomes.config` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable ignored: `manifest.name` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable ignored: `manifest.homePage` * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - File ignored due to lint config: `CODE_OF_CONDUCT.md` * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - File ignored due to lint config: `LICENSE` or `LICENSE.md` or `LICENCE` or `LICENCE.md` * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/ISSUE_TEMPLATE/bug_report.yml` * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - File does not exist: `.github/ISSUE_TEMPLATE/config.yml` * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/PULL_REQUEST_TEMPLATE.md` * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/workflows/branch.yml` * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/workflows/linting.yml` * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - File ignored due to lint config: `assets/nf-core-blobtoolkit_logo_light.png` * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - File ignored due to lint config: `docs/images/nf-core-blobtoolkit_logo_light.png` * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - File ignored due to lint config: `docs/images/nf-core-blobtoolkit_logo_dark.png` * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - File ignored due to lint config: `lib/NfcoreTemplate.groovy` * [actions_awstest](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/actions_awstest.html) - 'awstest.yml' workflow not found: `/home/runner/work/blobtoolkit/blobtoolkit/.github/workflows/awstest.yml` * [template_strings](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/template_strings.html) - template_strings * [merge_markers](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/merge_markers.html) - merge_markers ### :white_check_mark: Tests passed: * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `.gitattributes` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `.gitignore` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `.nf-core.yml` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `.editorconfig` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `.prettierignore` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `.prettierrc.yml` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `CHANGELOG.md` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `CITATIONS.md` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `LICENSE` or `LICENSE.md` or `LICENCE` or `LICENCE.md` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `nextflow_schema.json` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `nextflow.config` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `README.md` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `.github/.dockstore.yml` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `.github/CONTRIBUTING.md` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `.github/ISSUE_TEMPLATE/bug_report.yml` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `.github/ISSUE_TEMPLATE/feature_request.yml` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `.github/PULL_REQUEST_TEMPLATE.md` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `.github/workflows/branch.yml` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `.github/workflows/ci.yml` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `.github/workflows/linting_comment.yml` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `.github/workflows/linting.yml` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `assets/email_template.html` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `assets/email_template.txt` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `assets/sendmail_template.txt` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `conf/modules.config` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `conf/test.config` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `conf/test_full.config` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `docs/output.md` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `docs/README.md` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `docs/README.md` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `docs/usage.md` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `lib/nfcore_external_java_deps.jar` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `lib/NfcoreTemplate.groovy` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `lib/Utils.groovy` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `lib/WorkflowMain.groovy` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `main.nf` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `assets/multiqc_config.yml` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `conf/base.config` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `lib/WorkflowBlobtoolkit.groovy` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `modules.json` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File found: `pyproject.toml` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File not found check: `Singularity` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File not found check: `parameters.settings.json` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File not found check: `pipeline_template.yml` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File not found check: `.nf-core.yaml` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File not found check: `bin/markdown_to_html.r` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File not found check: `conf/aws.config` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File not found check: `.github/workflows/push_dockerhub.yml` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File not found check: `.github/ISSUE_TEMPLATE/bug_report.md` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File not found check: `.github/ISSUE_TEMPLATE/feature_request.md` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File not found check: `docs/images/nf-core-blobtoolkit_logo.png` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File not found check: `.markdownlint.yml` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File not found check: `.yamllint.yml` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File not found check: `lib/Checks.groovy` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File not found check: `lib/Completion.groovy` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File not found check: `lib/Workflow.groovy` * [files_exist](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_exist.html) - File not found check: `.travis.yml` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable found: `manifest.nextflowVersion` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable found: `manifest.description` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable found: `manifest.version` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable found: `timeline.enabled` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable found: `trace.enabled` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable found: `report.enabled` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable found: `dag.enabled` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable found: `process.cpus` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable found: `process.memory` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable found: `process.time` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable found: `params.outdir` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable found: `params.input` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable found: `params.validationShowHiddenParams` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable found: `params.validationSchemaIgnoreParams` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable found: `manifest.mainScript` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable found: `timeline.file` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable found: `trace.file` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable found: `report.file` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable found: `dag.file` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.nf_required_version` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.container` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.singleEnd` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.igenomesIgnore` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.name` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.enable_conda` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config ``timeline.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config ``report.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config ``trace.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config ``dag.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config ``dag.file`` ended with ``.html`` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config variable ``manifest.nextflowVersion`` started with >= or !>= * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config ``manifest.version`` ends in ``dev``: ``0.6.0-dev`` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config `params.custom_config_version` is set to `master` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Config `params.custom_config_base` is set to `https://raw.githubusercontent.com/nf-core/configs/master` * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - Lines for loading custom profiles found * [nextflow_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/nextflow_config.html) - nextflow.config contains configuration profile `test` * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - `.gitattributes` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - `.prettierrc.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - `.github/.dockstore.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - `.github/CONTRIBUTING.md` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - `.github/ISSUE_TEMPLATE/feature_request.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - `.github/workflows/linting_comment.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - `assets/email_template.html` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - `assets/email_template.txt` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - `assets/sendmail_template.txt` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - `docs/README.md` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - `lib/nfcore_external_java_deps.jar` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - `.gitignore` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - `.prettierignore` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/files_unchanged.html) - `pyproject.toml` matches the template * [actions_ci](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/actions_ci.html) - '.github/workflows/ci.yml' is triggered on expected events * [actions_ci](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/actions_ci.html) - '.github/workflows/ci.yml' checks minimum NF version * [readme](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/readme.html) - README Nextflow minimum version badge matched config. Badge: `23.04.0`, Config: `23.04.0` * [readme](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/readme.html) - README Zenodo placeholder was replaced with DOI. * [pipeline_todos](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/pipeline_todos.html) - No TODO strings found * [pipeline_name_conventions](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/pipeline_name_conventions.html) - Name adheres to nf-core convention * [schema_lint](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/schema_lint.html) - Schema lint passed * [schema_lint](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/schema_lint.html) - Schema title + description lint passed * [schema_lint](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/schema_lint.html) - Input mimetype lint passed: 'text/csv' * [schema_params](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/schema_params.html) - Schema matched params returned from nextflow config * [system_exit](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/system_exit.html) - No `System.exit` calls found * [actions_schema_validation](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/actions_schema_validation.html) - Workflow validation passed: branch.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/actions_schema_validation.html) - Workflow validation passed: fix-linting.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/actions_schema_validation.html) - Workflow validation passed: linting.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/actions_schema_validation.html) - Workflow validation passed: clean-up.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/actions_schema_validation.html) - Workflow validation passed: ci.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/actions_schema_validation.html) - Workflow validation passed: linting_comment.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/actions_schema_validation.html) - Workflow validation passed: release-announcements.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/actions_schema_validation.html) - Workflow validation passed: sanger_test_full.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/actions_schema_validation.html) - Workflow validation passed: sanger_test.yml * [modules_json](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/modules_json.html) - Only installed modules found in `modules.json` * [multiqc_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/multiqc_config.html) - 'assets/multiqc_config.yml' contains `report_section_order` * [multiqc_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/multiqc_config.html) - 'assets/multiqc_config.yml' contains `export_plots` * [multiqc_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/multiqc_config.html) - 'assets/multiqc_config.yml' contains `report_comment` * [multiqc_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/multiqc_config.html) - 'assets/multiqc_config.yml' follows the ordering scheme of the minimally required plugins. * [multiqc_config](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/multiqc_config.html) - 'assets/multiqc_config.yml' contains 'export_plots: true'. * [modules_structure](https://nf-co.re/tools/docs/2.11/pipeline_lint_tests/modules_structure.html) - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL' ### Run details * nf-core/tools version 2.11 * Run at `2024-08-24 10:19:22`
github-actions[bot] commented 4 months ago

Python linting (black) is failing

To keep the code consistent with lots of contributors, we run automated code consistency checks. To fix this CI test, please run:

Once you push these changes the test should pass, and you can hide this comment :+1:

We highly recommend setting up Black in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!

Thanks again for your contribution!

muffato commented 4 months ago

I've added some code to achieve the goal of taxonomiser_v2.py, which is: find a taxon_id that is recognised by the NT database and the closest to the species of interest. It's implemented very differently from the script. I leverage the taxonomy4blast.sqlite3 database that is shipped with NT and essentially lists the taxon_ids it knows about. If the species' taxon_id is not recognised, then it looks for the parent, etc.

As far as I understand the requirements, this is the last bit that was missing to complete support for draft assemblies. I'll mark this pull-request as ready.

muffato commented 2 months ago

@eeaunin . I've rebased this branch. It now includes the fixes I've made for blast

eeaunin commented 1 month ago

I had a closer look at how -negative_taxids has been implemented in the Snakemake pipeline and it appears quite confusing. The BlobToolKit paper (https://academic.oup.com/g3journal/article/10/4/1361/6026202) says:

An optional filter excludes a configurable list of NCBI taxIDs (default: excludes query genus).

So the exclusion of taxids is supposed to be optional and configurable by the user. BlobToolKit pipeline v1 has the mask_ids setting for excluding taxids:

https://github.com/blobtoolkit/pipeline/blob/master/v1/example.yaml

However, I couldn't find a setting for the same thing in the Snakemake pipeline v2 code. Maybe the authors just forgot to include it?

In my runs with the Snakemake pipeline negative taxids were not used but there are suppressed error messages buried in the run logs relating to that. In a run with a Plasmodium yoelii yoelii assembly there is this error in the logs (/lustre/scratch123/tol/teams/tola/users/ea10/pipeline_testing/20230215_pyoelii_asg_cobiont_check_run/btk_busco/blastn/logs/pyoelii/run_blastn.log):

BLAST Database error: Taxonomy ID(s) not found.Taxonomy ID(s) not found. This could be because the ID(s) provided are not at or below the species level. Please use get_species_taxids.sh to get taxids for nodes higher than species (see https://www.ncbi.nlm.nih.gov/books/NBK546209/).
Restarting blastn without taxid filter

So it ran into the error but then just quietly continued running. It is unclear to me what caused this error, as the taxid used there (352914) is at strain level.

In another run it has skipped using the taxid filter due to another error: /lustre/scratch123/tol/teams/grit/contamination_screen/icMagCera1/20240712_icMagCera1.20240711.hap1.fa_asg_cobiont_check_run/btk_busco/blastn/logs/icMagCera1.20240711.hap1.fa/run_blastn.log

BLAST Database error: Taxonomy filtering is not supported in v4 BLAST dbs
Restarting blastn without taxid filter

So the filtering doesn't work if the supplied database is V4 instead of V5 but this also doesn't crash the Snakemake pipeline and just produces an error message in the logs.

I guess it would be okay if the sanger-tol/blobtoolkit pipeline used -negative_taxids in all runs with draft assemblies as long as this doesn't produce frequent crashes. But I think it would be better if the use of -negative_taxids was optional for draft assemblies.

muffato commented 1 month ago

@eeaunin . I've added a --skip_taxon_filtering flag for you. It removes the taxon filtering from all Blast searches

I've rebased the branch onto the latest stable release 0.5.1

eeaunin commented 1 month ago

That's good then! I think it's fine to merge the draft_assemblies branch to dev now