Filter COI ASV sequences based on their length and codons

lokeshbio commented 1 year ago

This pull request contains a filtering step of ASVs that are COI based on their length and also their protein coding sequence. These ASVs are generated by any of the following primers:

forward:
    - "CCHGAYATRGCHTTYCCHCG"
    - "ACCHGAYATRGCHTTYCCHCG"
    - "GACCHGAYATRGCHTTYCCHCG"
    - "TGACCHGAYATRGCHTTYCCHCG"
reverse:
    - "CDGGRTGNCCRAARAAYCA"
    - "TCDGGRTGNCCRAARAAYCA"
    - "ATCDGGRTGNCCRAARAAYCA"
    - "GATCDGGRTGNCCRAARAAYCA"

PR checklist

[X] This comment contains a description of changes (with reason).
[ ] If you've fixed a bug or added code that should be tested, add tests!
[ ] If you've added a new tool - have you followed the pipeline conventions in the contribution docs- [ ] If necessary, also make a PR on the nf-core/ampliseq branch on the nf-core/test-datasets repository.
[ ] Make sure your code lints (nf-core lint).
[ ] Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
[ ] Usage Documentation in docs/usage.md is updated.
[X] Output Documentation in docs/output.md is updated.
[ ] CHANGELOG.md is updated.
[ ] README.md is updated (including new tool citations and authors/contributors).

github-actions[bot] commented 1 year ago

This PR is against the `master` branch :x:

Do not close this PR
Click Edit and change the base to dev
This CI test will remain failed until you push a new commit

Hi @lokeshbio,

It looks like this pull-request is has been made against the lokeshbio/ampliseq master branch. The master branch on nf-core repositories should always contain code from the latest release. Because of this, PRs to master are only allowed if they come from the lokeshbio/ampliseq dev branch.

You do not need to close this PR, you can change the target branch to dev by clicking the "Edit" button at the top of this page. Note that even after this, the test will continue to show as failing until you push a new commit.

Thanks again for your contribution!

github-actions[bot] commented 1 year ago

`nf-core lint` overall result: Passed :white_check_mark: :warning:

Posted for pipeline commit fbee8c3

+| ✅ 154 tests passed       |+
#| ❔   1 tests were ignored |#
!| ❗   3 tests had warnings |!

### :heavy_exclamation_mark: Test warnings: * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config ``manifest.version`` should end in ``dev``: ``'2.5.0'`` * [readme](https://nf-co.re/tools-docs/lint_tests/readme.html) - README did not have a Nextflow minimum version badge. * [schema_lint](https://nf-co.re/tools-docs/lint_tests/schema_lint.html) - Parameter `input` is not defined in the correct subschema (input_output_options) ### :grey_question: Tests ignored: * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `conf/igenomes.config` ### :white_check_mark: Tests passed: * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.gitattributes` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.gitignore` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.nf-core.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.editorconfig` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.prettierignore` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.prettierrc.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CHANGELOG.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CITATIONS.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CODE_OF_CONDUCT.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CODE_OF_CONDUCT.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `LICENSE` or `LICENSE.md` or `LICENCE` or `LICENCE.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `nextflow_schema.json` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `nextflow.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `README.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/.dockstore.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/CONTRIBUTING.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/ISSUE_TEMPLATE/bug_report.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/ISSUE_TEMPLATE/config.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/ISSUE_TEMPLATE/feature_request.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/PULL_REQUEST_TEMPLATE.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/branch.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/ci.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/linting_comment.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/linting.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/email_template.html` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/email_template.txt` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/sendmail_template.txt` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/nf-core-ampliseq_logo_light.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/modules.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/test.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/test_full.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/images/nf-core-ampliseq_logo_light.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/images/nf-core-ampliseq_logo_dark.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/output.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/README.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/README.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/usage.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/nfcore_external_java_deps.jar` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/NfcoreSchema.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/NfcoreTemplate.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/Utils.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/WorkflowMain.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `main.nf` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/multiqc_config.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/base.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/awstest.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/awsfulltest.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/WorkflowAmpliseq.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `modules.json` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `pyproject.toml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `Singularity` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `parameters.settings.json` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.nf-core.yaml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `bin/markdown_to_html.r` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `conf/aws.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.github/workflows/push_dockerhub.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.github/ISSUE_TEMPLATE/bug_report.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.github/ISSUE_TEMPLATE/feature_request.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `docs/images/nf-core-ampliseq_logo.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.markdownlint.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.yamllint.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `lib/Checks.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `lib/Completion.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `lib/Workflow.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.travis.yml` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `manifest.name` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `manifest.nextflowVersion` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `manifest.description` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `manifest.version` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `manifest.homePage` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `timeline.enabled` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `trace.enabled` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `report.enabled` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `dag.enabled` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `process.cpus` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `process.memory` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `process.time` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `params.outdir` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `params.input` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `params.show_hidden_params` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `params.schema_ignore_params` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `manifest.mainScript` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `timeline.file` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `trace.file` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `report.file` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `dag.file` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.nf_required_version` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.container` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.singleEnd` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.igenomesIgnore` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.name` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.enable_conda` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config ``timeline.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config ``report.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config ``trace.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config ``dag.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config ``manifest.name`` began with ``nf-core/`` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable ``manifest.homePage`` began with https://github.com/nf-core/ * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config ``dag.file`` ended with ``.html`` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable ``manifest.nextflowVersion`` started with >= or !>= * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config `params.custom_config_version` is set to `master` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config `params.custom_config_base` is set to `https://raw.githubusercontent.com/nf-core/configs/master` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Lines for loading custom profiles found * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.gitattributes` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.prettierrc.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `CODE_OF_CONDUCT.md` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `LICENSE` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/.dockstore.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/CONTRIBUTING.md` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/ISSUE_TEMPLATE/bug_report.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/ISSUE_TEMPLATE/config.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/ISSUE_TEMPLATE/feature_request.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/PULL_REQUEST_TEMPLATE.md` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/workflows/branch.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/workflows/linting_comment.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/workflows/linting.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `assets/email_template.html` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `assets/email_template.txt` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `assets/sendmail_template.txt` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `assets/nf-core-ampliseq_logo_light.png` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `docs/images/nf-core-ampliseq_logo_light.png` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `docs/images/nf-core-ampliseq_logo_dark.png` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `docs/README.md` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `lib/nfcore_external_java_deps.jar` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `lib/NfcoreSchema.groovy` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `lib/NfcoreTemplate.groovy` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.gitignore` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.prettierignore` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `pyproject.toml` matches the template * [actions_ci](https://nf-co.re/tools-docs/lint_tests/actions_ci.html) - '.github/workflows/ci.yml' is triggered on expected events * [actions_ci](https://nf-co.re/tools-docs/lint_tests/actions_ci.html) - '.github/workflows/ci.yml' checks minimum NF version * [actions_awstest](https://nf-co.re/tools-docs/lint_tests/actions_awstest.html) - '.github/workflows/awstest.yml' is triggered correctly * [actions_awsfulltest](https://nf-co.re/tools-docs/lint_tests/actions_awsfulltest.html) - `.github/workflows/awsfulltest.yml` is triggered correctly * [actions_awsfulltest](https://nf-co.re/tools-docs/lint_tests/actions_awsfulltest.html) - `.github/workflows/awsfulltest.yml` does not use `-profile test` * [readme](https://nf-co.re/tools-docs/lint_tests/readme.html) - README Nextflow minimum version in Quick Start section matched config. README: `22.10.1`, Config: `22.10.1` * [pipeline_todos](https://nf-co.re/tools-docs/lint_tests/pipeline_todos.html) - No TODO strings found * [pipeline_name_conventions](https://nf-co.re/tools-docs/lint_tests/pipeline_name_conventions.html) - Name adheres to nf-core convention * [template_strings](https://nf-co.re/tools-docs/lint_tests/template_strings.html) - Did not find any Jinja template strings (174 files) * [schema_lint](https://nf-co.re/tools-docs/lint_tests/schema_lint.html) - Schema lint passed * [schema_lint](https://nf-co.re/tools-docs/lint_tests/schema_lint.html) - Schema title + description lint passed * [schema_params](https://nf-co.re/tools-docs/lint_tests/schema_params.html) - Schema matched params returned from nextflow config * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: linting_comment.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: awsfulltest.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: awstest.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: fix-linting.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: branch.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: ci.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: linting.yml * [merge_markers](https://nf-co.re/tools-docs/lint_tests/merge_markers.html) - No merge markers found in pipeline files * [modules_json](https://nf-co.re/tools-docs/lint_tests/modules_json.html) - Only installed modules found in `modules.json` * [multiqc_config](https://nf-co.re/tools-docs/lint_tests/multiqc_config.html) - 'assets/multiqc_config.yml' follows the ordering scheme of the minimally required plugins. * [multiqc_config](https://nf-co.re/tools-docs/lint_tests/multiqc_config.html) - 'assets/multiqc_config.yml' contains a matching 'report_comment'. * [multiqc_config](https://nf-co.re/tools-docs/lint_tests/multiqc_config.html) - 'assets/multiqc_config.yml' contains 'export_plots: true'. * [modules_structure](https://nf-co.re/tools-docs/lint_tests/modules_structure.html) - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL' ### Run details * nf-core/tools version 2.7.2 * Run at `2023-04-03 14:13:07`

d4straub commented 1 year ago

Thanks, this addition looks interesting! I havent investigated any COI data yet, so I might ask basic questions here.

Is this related to https://github.com/nf-core/ampliseq/issues/450, that seems to discuss also stop codons?

I do think it might be better to make the length thresholds adjustable instead of having them hard coded. Coming to think of it, is it even important to have the length filter? Wouldnt it sufficient to have a stop codon filtering (i.e. contains a stop codon -> remove) enough, because a length filter is already implemented in the pipeline?

Unfortunately some standards are not yet satisfied, such as phyton code (see PythonBlack), general formatting (prettier / EditroConfig). Also, as far as I see, you introduced a new parameter called --filter_coi_asv and that needs to go into the documentation in nextflow_schema.json & the default to nextflow.config.

_edit: also, you seem to have based your work on the master branch, instead you are supposed to use the dev branch. Please consider reading https://nf-co.re/docs/contributing/code_formating and other helpful material in the nf-core documentation https://nf-co.re/docs._

lokeshbio commented 1 year ago

Hi Daniel,

In some ways it could be related #450 https://github.com/nf-core/ampliseq/issues/450, but this particular feature is only applicable for the primer-pairs I have mentioned in the pull-request! As, it looks for stop codons from a particular position of the amplicon where the codons are in one reading frame! The reason also why I hardcoded the length-thresholds is because of the same reason that these are the lengths we found that it suits well for these primers. We are in the process of writing a paper discussing all of these results! As there is a length filtering feature already exists anyway! I forgot to add the actual option in the documentation, sorry about that!

Cheers Lokesh Lokeshwaran Manoharan

On Mon, Apr 3, 2023 at 4:24 PM Daniel Straub @.***> wrote:

Thanks, this addition looks interesting! I havent investigated any COI data yet, so I might ask basic questions here.

Is this related to #450 https://github.com/nf-core/ampliseq/issues/450, that seems to discuss also stop codons?

I do think it might be better to make the length thresholds adjustable instead of having them hard coded. Coming to think of it, is it even important to have the length filter? Wouldnt it sufficient to have a stop codon filtering (i.e. contains a stop codon -> remove) enough, because a length filter is already implemented in the pipeline?

Unfortunately some standards are not yet satisfied, such as phyton code (see PythonBlack), general formatting (prettier / EditroConfig). Also, as far as I see, you introduced a new parameter called --filter_coi_asv and that needs to go into the documentation in nextflow_schema.json & the default to nextflow.config.

— Reply to this email directly, view it on GitHub https://github.com/nf-core/ampliseq/pull/568#issuecomment-1494425623, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGVCT36Z2PGT5AF53UOVKLDW7LMSDANCNFSM6AAAAAAWRLUQXU . You are receiving this because you were mentioned.Message ID: @.***>

d4straub commented 1 year ago

Thanks for updating the documentation.

but this particular feature is only applicable for the primer-pairs I have mentioned in the pull-request!

That is my criticism. The code you added could be relatively simply adjusted to a wider application area, I think. When changing the following, one could filter all coding amplicons:

removing the length filter in you python script bin/filt_InsBiom_asv.py
add parameter taking the reading frame of the amplicon, e.g. --reading_frame 1 would mean it checks reading frame starting from position 1 (other options might be 2,3). All ASV with stop codons would be removed. Default would be 0 for no filtering at all. This could be simply an offset in your code line sub_list.append(sub_seq[x:x+3]) -> sub_list.append(sub_seq[x+offset:x+3+offset]) where def offset = params.reading_frame -1
add parameter taking a set of stop codons to look for, e.g. --stop_codons "TAA,TAG" in your case, default would be "TAA,TAG". That is just to maximize customization and avoid any hard-coding.
--filter_coi_asv would be not necessary

Now, to analyze data with the settings you attempt to add here, running the pipeline with the following parameters should do the trick: --reading_frame 1 --min_len_asv 403 --max_len_asv 418 (--stop_codons not needed because default is fine)

Please let me know whether that is indeed an option or whether I am wrong.

lokeshbio commented 1 year ago

Hi Daniel,

Thank you very much for being patient with me and explaining me! I totally agree that the feature should be broadly applicable and it does not require too many changes in the code in itself! I will fix the python script and make a new pull request soon!

Cheers Lokesh Lokeshwaran Manoharan

On Tue, Apr 11, 2023 at 11:19 AM Daniel Straub @.***> wrote:

Thanks for updating the documentation.

but this particular feature is only applicable for the primer-pairs I have mentioned in the pull-request!

That is my criticism. The code you added could be relatively simply adjusted to a wider application area, I think. When changing the following, one could filter all coding amplicons:

removing the length filter in you python script bin/filt_InsBiom_asv.py

add parameter taking the reading frame of the amplicon, e.g. --reading_frame 1 would mean it checks reading frame starting from position 1 (other options might be 2,3). All ASV with stop codons would be removed. Default would be 0 for no filtering at all. This could be simply an offset in your code line sub_list.append(sub_seq[x:x+3]) -> sub_list.append(sub_seq[x+offset:x+3+offset]) where def offset = params.reading_frame -1

add parameter taking a set of stop codons to look for, e.g. --stop_codons "TAA,TAG" in your case, default would be "TAA,TAG". That is just to maximize customization and avoid any hard-coding.

--filter_coi_asv would be not necessary

Now, to analyze data with the settings you attempt to add here, running the pipeline with the following parameters should do the trick: --reading_frame 1 --min_len_asv 403 --max_len_asv 418 (--stop_codons not needed because default is fine)

Please let me know whether that is indeed an option or whether I am wrong.

— Reply to this email directly, view it on GitHub https://github.com/nf-core/ampliseq/pull/568#issuecomment-1502974087, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGVCT3YL7IDJ6B6T5KA6ISLXAUOYRANCNFSM6AAAAAAWRLUQXU . You are receiving this because you were mentioned.Message ID: @.***>

d4straub commented 1 year ago

Sounds great!

Could you base your work please on branch dev instead of master? The master branch contains code of the current release, the dev branch contains the current code which will go into the master branch once a new release is made.

lokeshbio commented 1 year ago

Oh yes! I keep forgetting that! Sorry about that.

Cheers Lokesh

On Tue, 11 Apr 2023, 12:22 Daniel Straub, @.***> wrote:

Sounds great!

Could you base your work please on branch dev instead of master? The master branch contains code of the current release, the dev branch contains the current code which will go into the master branch once a new release is made.

— Reply to this email directly, view it on GitHub https://github.com/nf-core/ampliseq/pull/568#issuecomment-1503067528, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGVCT33FD7QT6YGKMOUOQJ3XAUWIDANCNFSM6AAAAAAWRLUQXU . You are receiving this because you were mentioned.Message ID: @.***>

nf-core / ampliseq