nf-core / ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2
https://nf-co.re/ampliseq
MIT License
182 stars 115 forks source link

Add taxonomic classification with kraken2 #637

Closed d4straub closed 1 year ago

d4straub commented 1 year ago

Kraken2 was reported to perform well on amplicon data, e.g. https://doi.org/10.1186/s40168-020-00900-2 & https://www.nature.com/articles/s41598-023-40799-x. I decided here to go with the idea of the pipeline to re-create ASVs and therefore used Kraken2 on ASVs instead of on raw reads. Kraken2 supports silva, rdp, and greengenes, but also whole genome databases. I added of the latter also the "standard" database which is huge but includes the typical microbes. Potentially the "standard" database would allow analyzing any amplicon, e.g. rarely used genes that have no proper gene specific database. Additionally, a custom (i.e. any) Kraken2 database can be used, even if here not added in the ref_databases.config. Added four parameter: --kraken2_ref_taxonomy, --kraken2_ref_tax_custom, --kraken2_assign_taxlevels, --kraken2_confidence

I looked into using confidence thresholds for taxonomic assignments, which are essentially k-mer match ratios for the LCA calculation, see https://github.com/DerrickWood/kraken2/blob/v2.1.3/docs/MANUAL.markdown#confidence-scoring. There is also an interesting discussion in https://github.com/DerrickWood/kraken2/issues/265 about confidence scores. However, I have the impression that there isnt a consensus and all of this is referring to shotgun metagenomics. When looking at the amplicon sequencing paper (https://doi.org/10.1186/s40168-020-00900-2), I couldnt find any mention of the confidence score but bracken was used after kraken2. So for now I added a parameter for the confidence threshold but kept it by default at 0, which is also kraken2's default. I am fine with when 1 k-mer causes e.g. a species instead of a genus annotation (i.e. a low confidence score for species), because I expect that only very rarely a k-mer is really specific. However, conflicting annotations (lets say 2 k-mer on genus a and 1 k-mer on genus b) would be good to capture (and report family in the example instead of genus a). Not sure if thats done right now as expected. There is a tool that calculates confidence scores for kraken2, see https://github.com/Ivarz/Conifer, which is also in bioconda. Potentially interesting.

PR checklist

github-actions[bot] commented 1 year ago

nf-core lint overall result: Failed :x:

Posted for pipeline commit b1500e1

+| ✅ 148 tests passed       |+
#| ❔   3 tests were ignored |#
!| ❗   2 tests had warnings |!
-| ❌   5 tests failed       |-
### :x: Test failures: * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `CODE_OF_CONDUCT.md` does not match the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/CONTRIBUTING.md` does not match the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/workflows/linting.yml` does not match the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `lib/NfcoreTemplate.groovy` does not match the template * [multiqc_config](https://nf-co.re/tools-docs/lint_tests/multiqc_config.html) - 'assets/multiqc_config.yml' does not contain a matching 'report_comment'. ### :heavy_exclamation_mark: Test warnings: * [readme](https://nf-co.re/tools-docs/lint_tests/readme.html) - README did not have a Nextflow minimum version badge. * [schema_lint](https://nf-co.re/tools-docs/lint_tests/schema_lint.html) - Parameter `input` is not defined in the correct subschema (input_output_options) ### :grey_question: Tests ignored: * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `conf/igenomes.config` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.gitattributes` * [actions_ci](https://nf-co.re/tools-docs/lint_tests/actions_ci.html) - actions_ci ### :white_check_mark: Tests passed: * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.gitattributes` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.gitignore` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.nf-core.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.editorconfig` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.prettierignore` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.prettierrc.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CHANGELOG.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CITATIONS.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CODE_OF_CONDUCT.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CODE_OF_CONDUCT.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `LICENSE` or `LICENSE.md` or `LICENCE` or `LICENCE.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `nextflow_schema.json` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `nextflow.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `README.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/.dockstore.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/CONTRIBUTING.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/ISSUE_TEMPLATE/bug_report.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/ISSUE_TEMPLATE/config.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/ISSUE_TEMPLATE/feature_request.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/PULL_REQUEST_TEMPLATE.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/branch.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/ci.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/linting_comment.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/linting.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/email_template.html` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/email_template.txt` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/sendmail_template.txt` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/nf-core-ampliseq_logo_light.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/modules.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/test.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/test_full.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/images/nf-core-ampliseq_logo_light.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/images/nf-core-ampliseq_logo_dark.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/output.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/README.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/README.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/usage.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/nfcore_external_java_deps.jar` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/NfcoreTemplate.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/Utils.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/WorkflowMain.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `main.nf` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/multiqc_config.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/base.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/awstest.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/awsfulltest.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/WorkflowAmpliseq.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `modules.json` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `pyproject.toml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `Singularity` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `parameters.settings.json` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `pipeline_template.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.nf-core.yaml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `bin/markdown_to_html.r` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `conf/aws.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.github/workflows/push_dockerhub.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.github/ISSUE_TEMPLATE/bug_report.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.github/ISSUE_TEMPLATE/feature_request.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `docs/images/nf-core-ampliseq_logo.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.markdownlint.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.yamllint.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `lib/Checks.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `lib/Completion.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `lib/Workflow.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.travis.yml` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `manifest.name` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `manifest.nextflowVersion` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `manifest.description` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `manifest.version` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `manifest.homePage` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `timeline.enabled` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `trace.enabled` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `report.enabled` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `dag.enabled` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `process.cpus` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `process.memory` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `process.time` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `params.outdir` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `params.input` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `params.validationShowHiddenParams` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `params.validationSchemaIgnoreParams` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `manifest.mainScript` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `timeline.file` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `trace.file` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `report.file` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `dag.file` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.nf_required_version` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.container` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.singleEnd` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.igenomesIgnore` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.name` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.enable_conda` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config ``timeline.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config ``report.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config ``trace.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config ``dag.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config ``manifest.name`` began with ``nf-core/`` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable ``manifest.homePage`` began with https://github.com/nf-core/ * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config ``dag.file`` ended with ``.html`` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable ``manifest.nextflowVersion`` started with >= or !>= * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config ``manifest.version`` ends in ``dev``: ``2.7.0dev`` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config `params.custom_config_version` is set to `master` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config `params.custom_config_base` is set to `https://raw.githubusercontent.com/nf-core/configs/master` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Lines for loading custom profiles found * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.prettierrc.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `LICENSE` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/.dockstore.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/ISSUE_TEMPLATE/bug_report.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/ISSUE_TEMPLATE/config.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/ISSUE_TEMPLATE/feature_request.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/PULL_REQUEST_TEMPLATE.md` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/workflows/branch.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/workflows/linting_comment.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `assets/email_template.html` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `assets/email_template.txt` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `assets/sendmail_template.txt` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `assets/nf-core-ampliseq_logo_light.png` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `docs/images/nf-core-ampliseq_logo_light.png` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `docs/images/nf-core-ampliseq_logo_dark.png` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `docs/README.md` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `lib/nfcore_external_java_deps.jar` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.gitignore` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.prettierignore` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `pyproject.toml` matches the template * [actions_awstest](https://nf-co.re/tools-docs/lint_tests/actions_awstest.html) - '.github/workflows/awstest.yml' is triggered correctly * [actions_awsfulltest](https://nf-co.re/tools-docs/lint_tests/actions_awsfulltest.html) - `.github/workflows/awsfulltest.yml` is triggered correctly * [actions_awsfulltest](https://nf-co.re/tools-docs/lint_tests/actions_awsfulltest.html) - `.github/workflows/awsfulltest.yml` does not use `-profile test` * [readme](https://nf-co.re/tools-docs/lint_tests/readme.html) - README Zenodo placeholder was replaced with DOI. * [pipeline_todos](https://nf-co.re/tools-docs/lint_tests/pipeline_todos.html) - No TODO strings found * [pipeline_name_conventions](https://nf-co.re/tools-docs/lint_tests/pipeline_name_conventions.html) - Name adheres to nf-core convention * [template_strings](https://nf-co.re/tools-docs/lint_tests/template_strings.html) - Did not find any Jinja template strings (253 files) * [schema_lint](https://nf-co.re/tools-docs/lint_tests/schema_lint.html) - Schema lint passed * [schema_lint](https://nf-co.re/tools-docs/lint_tests/schema_lint.html) - Schema title + description lint passed * [schema_params](https://nf-co.re/tools-docs/lint_tests/schema_params.html) - Schema matched params returned from nextflow config * [system_exit](https://nf-co.re/tools-docs/lint_tests/system_exit.html) - No `System.exit` calls found * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: fix-linting.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: ci.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: awstest.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: linting_comment.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: awsfulltest.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: clean-up.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: linting.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: branch.yml * [merge_markers](https://nf-co.re/tools-docs/lint_tests/merge_markers.html) - No merge markers found in pipeline files * [modules_json](https://nf-co.re/tools-docs/lint_tests/modules_json.html) - Only installed modules found in `modules.json` * [multiqc_config](https://nf-co.re/tools-docs/lint_tests/multiqc_config.html) - 'assets/multiqc_config.yml' follows the ordering scheme of the minimally required plugins. * [multiqc_config](https://nf-co.re/tools-docs/lint_tests/multiqc_config.html) - 'assets/multiqc_config.yml' contains 'export_plots: true'. * [modules_structure](https://nf-co.re/tools-docs/lint_tests/modules_structure.html) - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL' ### Run details * nf-core/tools version 2.10 * Run at `2023-09-26 08:45:02`
d4straub commented 1 year ago

Thanks for your review!