nf-core / scrnaseq

A single-cell RNAseq pipeline for 10X genomics data
https://nf-co.re/scrnaseq
MIT License
178 stars 154 forks source link

Fix issue 81, "call empty droplets" #301

Closed fmalmeida closed 3 months ago

fmalmeida commented 4 months ago

Right now, just opening a draft PR on the attempt of solving issue #81 so that it is easier to keep track of modifications.

One work is "done" I will add a thorough overview of the changes, with explanations of the main modifications and listing on TODOs to be addressed before merging.

Then, of course, only then I will add some reviewers.

github-actions[bot] commented 4 months ago

Python linting (black) is failing

To keep the code consistent with lots of contributors, we run automated code consistency checks. To fix this CI test, please run:

Once you push these changes the test should pass, and you can hide this comment :+1:

We highly recommend setting up Black in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!

Thanks again for your contribution!

github-actions[bot] commented 4 months ago

nf-core lint overall result: Passed :white_check_mark: :warning:

Posted for pipeline commit 2406600

+| ✅ 171 tests passed       |+
#| ❔   4 tests were ignored |#
!| ❗   3 tests had warnings |!
### :heavy_exclamation_mark: Test warnings: * [pipeline_todos](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/pipeline_todos) - TODO string in `main.nf`: _Optionally add in-text citation tools to this list._ * [pipeline_todos](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/pipeline_todos) - TODO string in `main.nf`: _Optionally add bibliographic entries to this list._ * [pipeline_todos](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/pipeline_todos) - TODO string in `main.nf`: _Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!_ ### :grey_question: Tests ignored: * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File is ignored: `lib/Utils.groovy` * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - File ignored due to lint config: `.github/ISSUE_TEMPLATE/bug_report.yml` * [template_strings](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/template_strings) - template_strings * [schema_params](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/schema_params) - schema_params ### :white_check_mark: Tests passed: * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `.gitattributes` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `.gitignore` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `.nf-core.yml` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `.editorconfig` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `.prettierignore` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `.prettierrc.yml` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `CHANGELOG.md` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `CITATIONS.md` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `CODE_OF_CONDUCT.md` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `LICENSE` or `LICENSE.md` or `LICENCE` or `LICENCE.md` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `nextflow_schema.json` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `nextflow.config` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `README.md` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `.github/.dockstore.yml` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `.github/CONTRIBUTING.md` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `.github/ISSUE_TEMPLATE/bug_report.yml` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `.github/ISSUE_TEMPLATE/config.yml` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `.github/ISSUE_TEMPLATE/feature_request.yml` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `.github/PULL_REQUEST_TEMPLATE.md` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `.github/workflows/branch.yml` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `.github/workflows/ci.yml` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `.github/workflows/linting_comment.yml` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `.github/workflows/linting.yml` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `assets/email_template.html` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `assets/email_template.txt` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `assets/sendmail_template.txt` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `assets/nf-core-scrnaseq_logo_light.png` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `conf/modules.config` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `conf/test.config` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `conf/test_full.config` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `docs/images/nf-core-scrnaseq_logo_light.png` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `docs/images/nf-core-scrnaseq_logo_dark.png` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `docs/output.md` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `docs/README.md` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `docs/README.md` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `docs/usage.md` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `main.nf` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `assets/multiqc_config.yml` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `conf/base.config` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `conf/igenomes.config` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `.github/workflows/awstest.yml` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `.github/workflows/awsfulltest.yml` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `modules.json` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File found: `pyproject.toml` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File not found check: `Singularity` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File not found check: `parameters.settings.json` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File not found check: `pipeline_template.yml` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File not found check: `.nf-core.yaml` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File not found check: `bin/markdown_to_html.r` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File not found check: `conf/aws.config` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File not found check: `.github/workflows/push_dockerhub.yml` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File not found check: `.github/ISSUE_TEMPLATE/bug_report.md` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File not found check: `.github/ISSUE_TEMPLATE/feature_request.md` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File not found check: `docs/images/nf-core-scrnaseq_logo.png` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File not found check: `.markdownlint.yml` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File not found check: `.yamllint.yml` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File not found check: `lib/Checks.groovy` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File not found check: `lib/Completion.groovy` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File not found check: `lib/Workflow.groovy` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File not found check: `lib/WorkflowMain.groovy` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File not found check: `lib/NfcoreTemplate.groovy` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File not found check: `lib/WorkflowScrnaseq.groovy` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File not found check: `lib/nfcore_external_java_deps.jar` * [files_exist](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_exist) - File not found check: `.travis.yml` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable found: `manifest.name` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable found: `manifest.nextflowVersion` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable found: `manifest.description` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable found: `manifest.version` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable found: `manifest.homePage` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable found: `timeline.enabled` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable found: `trace.enabled` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable found: `report.enabled` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable found: `dag.enabled` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable found: `process.cpus` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable found: `process.memory` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable found: `process.time` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable found: `params.outdir` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable found: `params.input` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable found: `params.validationShowHiddenParams` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable found: `params.validationSchemaIgnoreParams` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable found: `manifest.mainScript` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable found: `timeline.file` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable found: `trace.file` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable found: `report.file` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable found: `dag.file` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable (correctly) not found: `params.nf_required_version` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable (correctly) not found: `params.container` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable (correctly) not found: `params.singleEnd` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable (correctly) not found: `params.igenomesIgnore` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable (correctly) not found: `params.name` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable (correctly) not found: `params.enable_conda` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config ``timeline.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config ``report.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config ``trace.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config ``dag.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config ``manifest.name`` began with ``nf-core/`` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable ``manifest.homePage`` began with https://github.com/nf-core/ * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config ``dag.file`` ended with ``.html`` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config variable ``manifest.nextflowVersion`` started with >= or !>= * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config ``manifest.version`` ends in ``dev``: ``2.6.0dev`` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config `params.custom_config_version` is set to `master` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config `params.custom_config_base` is set to `https://raw.githubusercontent.com/nf-core/configs/master` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Lines for loading custom profiles found * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - nextflow.config contains configuration profile `test` * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.aligner= alevin * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.protocol= auto * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.igenomes_base= s3://ngi-igenomes/igenomes/ * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.simpleaf_rlen= 91 * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.star_feature= Gene * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.kb_workflow= standard * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.custom_config_version= master * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.max_cpus= 16 * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.max_memory= 128.GB * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.max_time= 240.h * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.publish_dir_mode= copy * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.max_multiqc_email_size= 25.MB * [nextflow_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.validate_params= true * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - `.gitattributes` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - `.prettierrc.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - `CODE_OF_CONDUCT.md` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - `LICENSE` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - `.github/.dockstore.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - `.github/CONTRIBUTING.md` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - `.github/ISSUE_TEMPLATE/config.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - `.github/ISSUE_TEMPLATE/feature_request.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - `.github/PULL_REQUEST_TEMPLATE.md` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - `.github/workflows/branch.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - `.github/workflows/linting_comment.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - `.github/workflows/linting.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - `assets/email_template.html` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - `assets/email_template.txt` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - `assets/sendmail_template.txt` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - `assets/nf-core-scrnaseq_logo_light.png` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - `docs/images/nf-core-scrnaseq_logo_light.png` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - `docs/images/nf-core-scrnaseq_logo_dark.png` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - `docs/README.md` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - `.gitignore` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - `.prettierignore` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/files_unchanged) - `pyproject.toml` matches the template * [actions_ci](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/actions_ci) - '.github/workflows/ci.yml' is triggered on expected events * [actions_ci](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/actions_ci) - '.github/workflows/ci.yml' checks minimum NF version * [actions_awstest](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/actions_awstest) - '.github/workflows/awstest.yml' is triggered correctly * [actions_awsfulltest](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/actions_awsfulltest) - `.github/workflows/awsfulltest.yml` is triggered correctly * [actions_awsfulltest](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/actions_awsfulltest) - `.github/workflows/awsfulltest.yml` does not use `-profile test` * [readme](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/readme) - README Nextflow minimum version badge matched config. Badge: `23.04.0`, Config: `23.04.0` * [readme](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/readme) - README Zenodo placeholder was replaced with DOI. * [pipeline_name_conventions](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/pipeline_name_conventions) - Name adheres to nf-core convention * [schema_lint](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/schema_lint) - Schema lint passed * [schema_lint](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/schema_lint) - Schema title + description lint passed * [schema_lint](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/schema_lint) - Input mimetype lint passed: 'text/csv' * [system_exit](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/system_exit) - No `System.exit` calls found * [actions_schema_validation](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/actions_schema_validation) - Workflow validation passed: awstest.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/actions_schema_validation) - Workflow validation passed: fix-linting.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/actions_schema_validation) - Workflow validation passed: linting_comment.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/actions_schema_validation) - Workflow validation passed: clean-up.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/actions_schema_validation) - Workflow validation passed: branch.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/actions_schema_validation) - Workflow validation passed: ci.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/actions_schema_validation) - Workflow validation passed: release-announcements.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/actions_schema_validation) - Workflow validation passed: awsfulltest.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/actions_schema_validation) - Workflow validation passed: download_pipeline.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/actions_schema_validation) - Workflow validation passed: linting.yml * [merge_markers](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/merge_markers) - No merge markers found in pipeline files * [modules_json](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/modules_json) - Only installed modules found in `modules.json` * [multiqc_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/multiqc_config) - 'assets/multiqc_config.yml' contains `report_section_order` * [multiqc_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/multiqc_config) - 'assets/multiqc_config.yml' contains `export_plots` * [multiqc_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/multiqc_config) - 'assets/multiqc_config.yml' contains `report_comment` * [multiqc_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/multiqc_config) - 'assets/multiqc_config.yml' follows the ordering scheme of the minimally required plugins. * [multiqc_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/multiqc_config) - 'assets/multiqc_config.yml' contains a matching 'report_comment'. * [multiqc_config](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/multiqc_config) - 'assets/multiqc_config.yml' contains 'export_plots: true'. * [modules_structure](https://nf-co.re/tools/docs/2.13.1/pipeline_lint_tests/modules_structure) - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL' ### Run details * nf-core/tools version 2.13.1 * Run at `2024-03-18 16:06:48`
fmalmeida commented 4 months ago

Hi @grst and @apeltzer , This PR is now ready for review and discussions. As promised, I am going to add a description of changes and open things.

The empty-drops module

First of all, I have added a script to perform the empty-drops calling and filtering using a library that is available in bioconda, bioconductor-dropletutils.

With that, I added a module which is simple, it takes a matrix file, and performs the empty drops call on it, generating another matrix file.

Note: The module fails on small datasets or whenever there is not sufficient data points for filtering. Thus, it must work on "raw/unprocessed" matrices and should be skipped (--skip_emptydrops=true) for small datasets.

The inclusion in the workflow

With the module generated, I could then include it in the workflow.

    // Run emptydrops calling module
    if ( !params.skip_emptydrops ) {

        //
        // emptydrops should only run on the raw matrices thus, filter-out the filtered result of the aligners that can produce it
        //
        if ( params.aligner in [ 'cellranger', 'cellrangerarc', 'kallisto', 'star' ] ) {
            ch_mtx_matrices_for_emptydrops =
                ch_mtx_matrices.filter { meta, mtx_files ->
                    mtx_files.toString().contains("raw_feature_bc_matrix") || // cellranger
                    mtx_files.toString().contains("counts_unfiltered")  || // kallisto
                    mtx_files.toString().contains("raw")                   // star
            }
        } else {
            ch_mtx_matrices_for_emptydrops = ch_mtx_matrices
        }
        EMPTYDROPS_CELL_CALLING( ch_mtx_matrices_for_emptydrops )
        ch_mtx_matrices = ch_mtx_matrices.mix( EMPTYDROPS_CELL_CALLING.out.filtered_matrices )
    }

One thing to note above is that, as discussed previously, I have to add a checker/filter in order to only pass on the raw/unprocessed matrices generated by the assemblers, because, I think it does not make sense to run the module in the already filtered/processed matrices.

Or do you think we should invert, only running in the processed ones?

Changes in conversion modules

Because now we will have both the data directly from the aligners, and a custom-made filtering module, I had to change the conversion modules a bit so they are aware of that, and can understand the difference between raw/filtered from the aligners itself and what is the custom empty drops filter.

With that, to try to avoid confusion by the user, I had to add such "suffixes" to the generated converted files, so now we write data with such sufixes:

*_{raw,filtered,custom_emptydrops_filter}_matrix.{h5ad,rds}
In this case, the meanings are: suffix meaning
raw Conversion of the raw/unprocessed matrix generated by the tool. It is also used for tools that generate only one matrix, such as alevin.
filtered Conversion of the filtered/processed matrix generated by the tool
custom_emptydrops_filter custom_emptydrops_filter Conversion of the matrix that was generated by the new custom empty drops filter module

I also had to update the two conversion-scripts for rds and h5ad, because now they also have to understand this new matrix generated, and to allow it to also transpose the matrices from cellranger, because, when we do the normal conversion of cellranger matrices, we use the .h5 files it generates, so it is very fine, but the cellranger-emptydrops filtering does not produce such file and then the conversion scripts need to understand it.

Integrating with the aligners

Finally, it has to be discussed the changes made in the aligners-connections to integrate this module. One by one.

Alevin

Because alevin only produce on matrix result, without producing a pair (raw&filtered) as others like cellranger, star and kallisto, the integration was seamless and required no major change, just connecting it to the module.

When using it, the results generated are the following:

alevin_run/
├── alevin
│   ├── mtx_conversions
│   │   ├── combined_custom_emptydrops_filter_matrix.h5ad
│   │   ├── combined_raw_matrix.h5ad
│   │   ├── pbmc8k
│   │   │   ├── pbmc8k_custom_emptydrops_filter_matrix.h5ad
│   │   │   ├── pbmc8k_custom_emptydrops_filter_matrix.rds
│   │   │   ├── pbmc8k_raw_matrix.h5ad
│   │   │   └── pbmc8k_raw_matrix.rds
│   │   └── versions.yml
│   ├── pbmc8k
│   │   └── emptydrops_filtered
│   │       ├── quants_mat_cols.txt
│   │       ├── quants_mat.mtx
│   │       └── quants_mat_rows.txt
│   ├── pbmc8k_alevin_results
│   │   ├── af_map
│   │   │   ├── alevin
│   │   │   ├── aux_info
│   │   │   ├── cmd_info.json
│   │   │   ├── libParams
│   │   │   ├── logs
│   │   │   ├── map.rad
│   │   │   └── unmapped_bc_count.bin
│   │   ├── af_quant
│   │   │   ├── alevin
│   │   │   ├── all_freq.bin
│   │   │   ├── collate.json
│   │   │   ├── featureDump.txt
│   │   │   ├── generate_permit_list.json
│   │   │   ├── map.collated.rad
│   │   │   ├── permit_freq.bin
│   │   │   ├── permit_map.bin
│   │   │   ├── quant.json
│   │   │   └── unmapped_bc_count_collated.bin
│   │   └── simpleaf_quant_log.json
├── alevinqc
├── fastqc
├── multiqc
└── pipeline_info

Kallisto

For kallisto, I first had to include a new parameter in the pipeline, called, --kb_filter because it is possible that kallisto produce only one matrix or a pair (raw/filtered) depending on whether the bustools --filter command is used or not. So, I added this new parameter to make sure I could include the module taking account of both scenarios.

Finally, to make sure I had a channel that could be properly used, and of course, that we could filter raw / filtered data to choose what to pass on to the empty drops filter module, I had to modify the generated channels by the module (see here).

tuple val(meta), path ("*.count")                  , emit: count
tuple val(meta), path ("*.count/counts_unfiltered"), emit: raw_counts                       // TODO: Add to nf-coew/modules before merging PR
tuple val(meta), path ("*.count/counts_filtered")  , emit: filtered_counts, optional: true  // TODO: Add to nf-coew/modules before merging PR

Then, of course, I updated the downstream snippets of the codes in the suf-workflows and workflow to understand it.

I know these changes shall got to nf-core/modules first. I added here first for discussion, when all solved, I can add to nf-core/modules whatever is needed.

Results look like this, when --filter is on.

kallisto_lamanno_run
├── fastqc
├── kallisto
│   ├── mtx_conversions
│   │   ├── combined_custom_emptydrops_filter_matrix.h5ad
│   │   ├── combined_filtered_matrix.h5ad
│   │   ├── combined_raw_matrix.h5ad
│   │   ├── pbmc8k
│   │   │   ├── pbmc8k_spliced_matrix.h5ad
│   │   │   ├── pbmc8k_spliced_matrix.rds
│   │   │   ├── pbmc8k_unspliced_matrix.h5ad
│   │   │   └── pbmc8k_unspliced_matrix.rds
│   │   └── versions.yml
│   ├── pbmc8k.count
│   │   ├── 10x_version2_whitelist.txt
│   │   ├── counts_filtered
│   │   │   ├── spliced.barcodes.txt
│   │   │   ├── spliced.genes.txt
│   │   │   ├── spliced.mtx
│   │   │   ├── unspliced.barcodes.txt
│   │   │   ├── unspliced.genes.txt
│   │   │   └── unspliced.mtx
│   │   ├── counts_unfiltered
│   │   │   ├── spliced.barcodes.txt
│   │   │   ├── spliced.genes.txt
│   │   │   ├── spliced.mtx
│   │   │   ├── unspliced.barcodes.txt
│   │   │   ├── unspliced.genes.txt
│   │   │   └── unspliced.mtx
│   │   ├── emptydrops_filtered
│   │   │   ├── spliced.barcodes.txt
│   │   │   ├── spliced.genes.txt
│   │   │   ├── spliced.mtx
│   │   │   ├── unspliced.barcodes.txt
│   │   │   ├── unspliced.genes.txt
│   │   │   └── unspliced.mtx
│   │   ├── filter_barcodes.txt
│   │   ├── inspect.json
│   │   ├── inspect.spliced.json
│   │   ├── inspect.unspliced.json
│   │   ├── kb_info.json
│   │   ├── matrix.ec
│   │   ├── output.bus
│   │   ├── output.filtered.bus
│   │   ├── output.unfiltered.bus
│   │   ├── run_info.json
│   │   ├── spliced.filtered.bus
│   │   ├── spliced.unfiltered.bus
│   │   ├── transcripts.txt
│   │   ├── unspliced.filtered.bus
│   │   └── unspliced.unfiltered.bus
│   └── versions.yml
├── multiqc
└── pipeline_info

kallisto_run
├── fastqc
├── kallisto
│   ├── mtx_conversions
│   │   ├── combined_custom_emptydrops_filter_matrix.h5ad
│   │   ├── combined_raw_matrix.h5ad
│   │   ├── pbmc8k
│   │   │   ├── pbmc8k_custom_emptydrops_filter_matrix.h5ad
│   │   │   ├── pbmc8k_custom_emptydrops_filter_matrix.rds
│   │   │   ├── pbmc8k_raw_matrix.h5ad
│   │   │   └── pbmc8k_raw_matrix.rds
│   │   └── versions.yml
│   ├── pbmc8k.count
│   │   ├── 10x_version2_whitelist.txt
│   │   ├── counts_unfiltered
│   │   │   ├── cells_x_genes.barcodes.txt
│   │   │   ├── cells_x_genes.genes.txt
│   │   │   └── cells_x_genes.mtx
│   │   ├── emptydrops_filtered
│   │   │   ├── cells_x_genes.barcodes.txt
│   │   │   ├── cells_x_genes.genes.txt
│   │   │   └── cells_x_genes.mtx
│   │   ├── inspect.json
│   │   ├── kb_info.json
│   │   ├── matrix.ec
│   │   ├── output.bus
│   │   ├── output.unfiltered.bus
│   │   ├── run_info.json
│   │   └── transcripts.txt
│   └── versions.yml
├── multiqc
└── pipeline_info

Cellranger

For cellranger, basically it happened the same to Kallisto. The difference is that cellranger always produces a pair of raw/filtered and then I had to just modify the channels to account for that so would make filtering later on easier (see here)

I also had to update the conversion-scripts because the emptydrops filter module does not produce a .h5 file so the scripts where not ready for converting .mtx cellranger files.

Results for it looks like this:

cellranger_run/
├── cellranger
│   ├── count
│   │   ├── pbmc8k
│   │   │   ├── emptydrops_filtered
│   │   │   └── outs
│   │   └── versions.yml
│   ├── mkgtf
│   │   └── genome_genes.filtered.gtf
│   ├── mkref
│   │   ├── cellranger_reference
│   │   │   ├── fasta
│   │   │   ├── genes
│   │   │   ├── reference.json
│   │   │   └── star
│   │   └── versions.yml
│   └── mtx_conversions
│       ├── combined_custom_emptydrops_filter_matrix.h5ad
│       ├── combined_filtered_matrix.h5ad
│       ├── combined_raw_matrix.h5ad
│       ├── pbmc8k
│       │   ├── pbmc8k_custom_emptydrops_filter_matrix.h5ad
│       │   ├── pbmc8k_custom_emptydrops_filter_matrix.rds
│       │   ├── pbmc8k_filtered_matrix.h5ad
│       │   ├── pbmc8k_filtered_matrix.rds
│       │   ├── pbmc8k_raw_matrix.h5ad
│       │   └── pbmc8k_raw_matrix.rds
│       └── versions.yml
├── fastqc
├── multiqc
└── pipeline_info

STAR

For star, basically the same for cellranger. It always produce a raw/filtered pair, but I had to adjust the out-channels to make the filtering/selection easier. Of course, adjusting all the downstream channel selections to account for them.

The results for it look like this:

star_run/
├── fastqc
├── multiqc
├── pipeline_info
└── star
    ├── mtx_conversions
    │   ├── combined_custom_emptydrops_filter_matrix.h5ad
    │   ├── combined_filtered_matrix.h5ad
    │   ├── combined_raw_matrix.h5ad
    │   ├── pbmc8k
    │   │   ├── pbmc8k_custom_emptydrops_filter_matrix.h5ad
    │   │   ├── pbmc8k_custom_emptydrops_filter_matrix.rds
    │   │   ├── pbmc8k_filtered_matrix.h5ad
    │   │   ├── pbmc8k_filtered_matrix.rds
    │   │   ├── pbmc8k_raw_matrix.h5ad
    │   │   └── pbmc8k_raw_matrix.rds
    │   └── versions.yml
    └── pbmc8k
        ├── emptydrops_filtered
        │   ├── barcodes.tsv
        │   ├── features.tsv
        │   └── matrix.mtx
        ├── pbmc8k.Aligned.sortedByCoord.out.bam
        ├── pbmc8k.Log.final.out
        ├── pbmc8k.Log.out
        ├── pbmc8k.Log.progress.out
        ├── pbmc8k.SJ.out.tab
        ├── pbmc8k.Solo.out
        │   ├── Barcodes.stats
        │   └── Gene
        └── versions.yml

UniverSC and cellrangerarc

I could not even run them, so could not be tested nor integrated. Thus, I believe they first need to be solved in order to have a proper testing profile to allow integrating these modules here.

Just not sure what should go first.

About the out-channels

You will see in the scrnaseq.nf workflow, that even after generating a proper channel for raw/filtered in the modules of cellranger, star and kallisto I still mix them all in the general ch_mtx_matrices channel.

I do this to guarantee all results are going to the downstream analysis. The reason I modified the modules to generate these split channels, was just to make sure they grabbed the folder which contains only the raw/filtered results to allow a proper filtering. Before they were using a higher level ** selection and then all files were being dumped in the channel in a flatten mode, so the filtering was very complicate and resulting in files with duplicated names.

You see that I had to modify the conversion subworkflow to account for that as well.

fmalmeida commented 3 months ago

Looks good to me apart from some parts:

  • Changes in the module code --> need to go upstream, ideally open PRs already for this
  • Upgrades in the respective upstream modules might be necessary, check out the cellranger update PR, which could be merged prior to updating the modules here I believe --> should be easy to do..
  • Some more docs on what this sfeature does would be helpful - so that people can both see it in the changelog and also in the main documentation

Yes, the changes in the modules I added as a TODO so that, once we know and agree on all changes required, it can be done in nf-core/modules.

About the docs, I will work on it.

grst commented 3 months ago

Hi @fmalmeida,

this looks really good and also many thanks for the detailled explanation above!

My only main concern would be the transposition thing, maybe there's a better solution that doesn't require too much effort?

Mid-term, I think we should restructure the mtx conversion part of the pipeline as it becomes a bit messy. This is beyond the scope of this issue though, so I created https://github.com/nf-core/scrnaseq/issues/310.

grst commented 3 months ago

Regarding cellranger-arc: I think it should run now. But not entirely sure if it makes sense to run emptydrops on scATAC-seq data in the first place? @heylf, WDYT?

EDIT: seems https://github.com/nf-core/scrnaseq/pull/300 needs to be merged first.

grst commented 3 months ago

@nf-core-bot fix linting

heylf commented 3 months ago

Regarding cellranger-arc: I think it should run now. But not entirely sure if it makes sense to run emptydrops on scATAC-seq data in the first place? @heylf, WDYT?

Lets say I have never compared filtering techniques for scATAC data. In the past I saw you can use emptydrop for scATAC, but emptydrops is specifcally for scRNA, so I would be careful in doing that.

fmalmeida commented 3 months ago

Hi @grst , So, the overall summary after reading all the comments, so that can finalise this PR is:

  1. I should already open the PRs in the nf-core/modules to accommodate the changes needed for this code to work.
  2. Enhance the documentation about the feature, in both changelog and main documentation.
  3. I will not add snippets for cellrangerarc as @heylf said, it will probably not make sense. If in the future it is requested, we can work specifically on it.
  4. Wait for #300 to be merged.

Is that all?

grst commented 3 months ago

sounds good

grst commented 3 months ago

@nf-core-bot fix linting

fmalmeida commented 3 months ago

Now that kallisto was updated, and the workflows it provides are different. I will have to test it with them as well. Once they work, the last thing required will be the docs and the PRs to modules.

fmalmeida commented 3 months ago

Hi @grst ,

Can you take a look again at the changes?

The only one missing is the last one, which is currently running.

grst commented 3 months ago

I'm wondering if instead of updating all those modules it would be easier to do something like

ch_filtered = ch_out.map{
        meta, files -> [meta, out.findAll{ it -> it.contains("filtered") }]
}
fmalmeida commented 3 months ago

I'm wondering if instead of updating all those modules it would be easier to do something like

ch_filtered = ch_out.map{
        meta, files -> [meta, out.findAll{ it -> it.contains("filtered") }]
}

That was my first try. But because many of them use the ** to grab the files. It catches the files and not the directories. So, when filtering, we only select the files that have it in its name instead of the directory and all that is inside.

I added some information in the last section of this comment https://github.com/nf-core/scrnaseq/pull/301#issuecomment-1945592102

grst commented 3 months ago

I see, fair enough then

fmalmeida commented 3 months ago

Pipeline execution terminated.

As said here, the nf-core modules were updated and TODOs removed. Documentation was updated. And also, workflow (kallisto) was updated so that the new modules works for both non-standard kallisto workflows, lamanno and nac.

Results structure organization and namings are being produced as said here.

Finally, all testings passed, so, merging the PR 😄