nf-core / taxprofiler

Highly parallelised multi-taxonomic profiling of shotgun short- and long-read metagenomic data
https://nf-co.re/taxprofiler
MIT License
127 stars 35 forks source link

Adding shortread deduplication feature with fastp #439

Closed maxibor closed 8 months ago

maxibor commented 8 months ago

This PR adds deduplication of reads with fastp

PR checklist

github-actions[bot] commented 8 months ago

nf-core lint overall result: Failed :x:

Posted for pipeline commit 7e3f119

+| ✅ 183 tests passed       |+
#| ❔   1 tests were ignored |#
-| ❌  11 tests failed       |-
### :x: Test failures: * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File must be removed: `lib/nfcore_external_java_deps.jar` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value incorrect: `params.igenomes_base` is set as `s3://ngi-igenomes/igenomes` in `nextflow_schema.json` but is `s3://ngi-igenomes/igenomes/` in `nextflow.config`. * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `.github/workflows/branch.yml` does not match the template * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `.github/workflows/linting_comment.yml` does not match the template * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `.github/workflows/linting.yml` does not match the template * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `assets/email_template.html` does not match the template * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `assets/email_template.txt` does not match the template * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `assets/nf-core-taxprofiler_logo_light.png` does not match the template * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `docs/images/nf-core-taxprofiler_logo_light.png` does not match the template * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `docs/images/nf-core-taxprofiler_logo_dark.png` does not match the template * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `pyproject.toml` does not match the template ### :grey_question: Tests ignored: * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - File does not exist: `lib/nfcore_external_java_deps.jar` ### :white_check_mark: Tests passed: * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `.gitattributes` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `.gitignore` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `.nf-core.yml` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `.editorconfig` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `.prettierignore` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `.prettierrc.yml` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `CHANGELOG.md` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `CITATIONS.md` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `CODE_OF_CONDUCT.md` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `LICENSE` or `LICENSE.md` or `LICENCE` or `LICENCE.md` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `nextflow_schema.json` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `nextflow.config` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `README.md` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `.github/.dockstore.yml` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `.github/CONTRIBUTING.md` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `.github/ISSUE_TEMPLATE/bug_report.yml` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `.github/ISSUE_TEMPLATE/config.yml` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `.github/ISSUE_TEMPLATE/feature_request.yml` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `.github/PULL_REQUEST_TEMPLATE.md` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `.github/workflows/branch.yml` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `.github/workflows/ci.yml` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `.github/workflows/linting_comment.yml` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `.github/workflows/linting.yml` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `assets/email_template.html` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `assets/email_template.txt` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `assets/sendmail_template.txt` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `assets/nf-core-taxprofiler_logo_light.png` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `conf/modules.config` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `conf/test.config` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `conf/test_full.config` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `docs/images/nf-core-taxprofiler_logo_light.png` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `docs/images/nf-core-taxprofiler_logo_dark.png` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `docs/output.md` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `docs/README.md` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `docs/README.md` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `docs/usage.md` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `lib/NfcoreTemplate.groovy` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `lib/Utils.groovy` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `lib/WorkflowMain.groovy` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `main.nf` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `assets/multiqc_config.yml` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `conf/base.config` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `conf/igenomes.config` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `.github/workflows/awstest.yml` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `.github/workflows/awsfulltest.yml` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `lib/WorkflowTaxprofiler.groovy` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `modules.json` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File found: `pyproject.toml` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File not found check: `Singularity` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File not found check: `parameters.settings.json` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File not found check: `pipeline_template.yml` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File not found check: `.nf-core.yaml` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File not found check: `bin/markdown_to_html.r` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File not found check: `conf/aws.config` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File not found check: `.github/workflows/push_dockerhub.yml` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File not found check: `.github/ISSUE_TEMPLATE/bug_report.md` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File not found check: `.github/ISSUE_TEMPLATE/feature_request.md` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File not found check: `docs/images/nf-core-taxprofiler_logo.png` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File not found check: `.markdownlint.yml` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File not found check: `.yamllint.yml` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File not found check: `lib/Checks.groovy` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File not found check: `lib/Completion.groovy` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File not found check: `lib/Workflow.groovy` * [files_exist](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_exist.html) - File not found check: `.travis.yml` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable found: `manifest.name` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable found: `manifest.nextflowVersion` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable found: `manifest.description` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable found: `manifest.version` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable found: `manifest.homePage` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable found: `timeline.enabled` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable found: `trace.enabled` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable found: `report.enabled` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable found: `dag.enabled` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable found: `process.cpus` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable found: `process.memory` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable found: `process.time` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable found: `params.outdir` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable found: `params.input` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable found: `params.validationShowHiddenParams` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable found: `params.validationSchemaIgnoreParams` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable found: `manifest.mainScript` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable found: `timeline.file` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable found: `trace.file` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable found: `report.file` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable found: `dag.file` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.nf_required_version` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.container` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.singleEnd` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.igenomesIgnore` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.name` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.enable_conda` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config ``timeline.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config ``report.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config ``trace.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config ``dag.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config ``manifest.name`` began with ``nf-core/`` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable ``manifest.homePage`` began with https://github.com/nf-core/ * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config ``dag.file`` ended with ``.html`` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config variable ``manifest.nextflowVersion`` started with >= or !>= * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config ``manifest.version`` ends in ``dev``: ``1.1.5dev`` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config `params.custom_config_version` is set to `master` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config `params.custom_config_base` is set to `https://raw.githubusercontent.com/nf-core/configs/master` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Lines for loading custom profiles found * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - nextflow.config contains configuration profile `test` * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.preprocessing_qc_tool * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.shortread_qc_tool * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.shortread_qc_minlength * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.shortread_complexityfilter_tool * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.shortread_complexityfilter_entropy * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.shortread_complexityfilter_bbduk_windowsize * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.shortread_complexityfilter_fastp_threshold * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.shortread_complexityfilter_prinseqplusplus_mode * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.shortread_complexityfilter_prinseqplusplus_dustscore * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.longread_qc_qualityfilter_minlength * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.longread_qc_qualityfilter_keeppercent * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.longread_qc_qualityfilter_targetbases * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.diamond_output_format * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.kaiju_taxon_rank * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.krakenuniq_ram_chunk_size * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.krakenuniq_batch_size * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.malt_mode * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.kmcp_mode * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.ganon_report_type * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.ganon_report_toppercentile * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.ganon_report_mincount * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.ganon_report_maxcount * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.standardisation_taxpasta_format * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.custom_config_version * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.custom_config_base * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.max_cpus * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.max_memory * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.max_time * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.publish_dir_mode * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.max_multiqc_email_size * [nextflow_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/nextflow_config.html) - Config default value correct: params.validate_params * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `.gitattributes` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `.prettierrc.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `CODE_OF_CONDUCT.md` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `LICENSE` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `.github/.dockstore.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `.github/CONTRIBUTING.md` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `.github/ISSUE_TEMPLATE/bug_report.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `.github/ISSUE_TEMPLATE/config.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `.github/ISSUE_TEMPLATE/feature_request.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `.github/PULL_REQUEST_TEMPLATE.md` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `assets/sendmail_template.txt` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `docs/README.md` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `lib/NfcoreTemplate.groovy` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `.gitignore` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/files_unchanged.html) - `.prettierignore` matches the template * [actions_ci](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/actions_ci.html) - '.github/workflows/ci.yml' is triggered on expected events * [actions_ci](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/actions_ci.html) - '.github/workflows/ci.yml' checks minimum NF version * [actions_awstest](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/actions_awstest.html) - '.github/workflows/awstest.yml' is triggered correctly * [actions_awsfulltest](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/actions_awsfulltest.html) - `.github/workflows/awsfulltest.yml` is triggered correctly * [actions_awsfulltest](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/actions_awsfulltest.html) - `.github/workflows/awsfulltest.yml` does not use `-profile test` * [readme](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/readme.html) - README Nextflow minimum version badge matched config. Badge: `23.04.0`, Config: `23.04.0` * [readme](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/readme.html) - README Zenodo placeholder was replaced with DOI. * [pipeline_todos](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/pipeline_todos.html) - No TODO strings found * [pipeline_name_conventions](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/pipeline_name_conventions.html) - Name adheres to nf-core convention * [template_strings](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/template_strings.html) - Did not find any Jinja template strings (242 files) * [schema_lint](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/schema_lint.html) - Schema lint passed * [schema_lint](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/schema_lint.html) - Schema title + description lint passed * [schema_lint](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/schema_lint.html) - Input mimetype lint passed: 'text/csv' * [schema_params](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/schema_params.html) - Schema matched params returned from nextflow config * [system_exit](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/system_exit.html) - No `System.exit` calls found * [actions_schema_validation](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/actions_schema_validation.html) - Workflow validation passed: linting.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/actions_schema_validation.html) - Workflow validation passed: release-announcements.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/actions_schema_validation.html) - Workflow validation passed: branch.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/actions_schema_validation.html) - Workflow validation passed: fix-linting.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/actions_schema_validation.html) - Workflow validation passed: awsfulltest.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/actions_schema_validation.html) - Workflow validation passed: ci.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/actions_schema_validation.html) - Workflow validation passed: clean-up.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/actions_schema_validation.html) - Workflow validation passed: linting_comment.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/actions_schema_validation.html) - Workflow validation passed: awstest.yml * [merge_markers](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/merge_markers.html) - No merge markers found in pipeline files * [modules_json](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/modules_json.html) - Only installed modules found in `modules.json` * [multiqc_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/multiqc_config.html) - 'assets/multiqc_config.yml' contains `report_section_order` * [multiqc_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/multiqc_config.html) - 'assets/multiqc_config.yml' contains `export_plots` * [multiqc_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/multiqc_config.html) - 'assets/multiqc_config.yml' contains `report_comment` * [multiqc_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/multiqc_config.html) - 'assets/multiqc_config.yml' follows the ordering scheme of the minimally required plugins. * [multiqc_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/multiqc_config.html) - 'assets/multiqc_config.yml' contains a matching 'report_comment'. * [multiqc_config](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/multiqc_config.html) - 'assets/multiqc_config.yml' contains 'export_plots: true'. * [modules_structure](https://nf-co.re/tools/docs/2.12/pipeline_lint_tests/modules_structure.html) - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL' ### Run details * nf-core/tools version 2.12 * Run at `2024-02-01 10:33:30`
maxibor commented 8 months ago

An easy PR for you @jfy133 to offset #548 😉

sofstam commented 8 months ago

I would say it is 1.1.5.

sofstam commented 8 months ago

Getting back with a review later today :)

Midnighter commented 8 months ago

I'm somewhat concerned with this feature. If I remember correctly, FASTQC and fastp use the first 50 and 75 bp, respectively, to judge read duplication. Using longer sequences would drive up memory requirements and take longer. So the first question is, are we truly only removing identical reads with this?

My second question comes from my inexperience with sequencing: If you have a dominant species in your metagenomic sample, how unlikely is it to have an identical read?

jfy133 commented 8 months ago

sing longer sequences would drive up memory requirements and take longer. So the first question is, are we truly only removing identical reads with this?

Does it really? the README at least seems to implies it's some condensed hash of the whole read: https://github.com/OpenGene/fastp#duplication-rate-evaluation. That said, it's opt-in so it's still up to the user to decide if it's a suitable algorithm

My second question comes from my inexperience with sequencing: If you have a dominant species in your metagenomic sample, how unlikely is it to have an identical read?

An absolutely exact duplicate is quite unlikely, as

  1. fragmentation protocols should be random (with a slight preference breakages around GCs IIRC), so that in combination with (relatively) longer reads it's unlikely due to sequence diversity.

  2. Exact duplicates are much more likely from lab-based amplicons as they use the same priming sequence, and given the number of amplification cycles also very likely to have copies from artifical duplicate rather than naturally occuring. At least in Illumina short-read protocols that is.

Midnighter commented 8 months ago

Thank you for your response, sounds good to me then. 👍🏼