nf-core / scdownstream

A single cell transcriptomics pipeline for QC, integration and making the data presentable
https://nf-co.re/scdownstream
MIT License
43 stars 12 forks source link

Add support for gene symbols specified in columns other than the var.index #73

Closed nictru closed 3 months ago

nictru commented 3 months ago

This PR adds documentation and a basic implementation of this feature. Some questions need to be cleared before merging:

  1. Does the gene_symbol column survive AnnData concatenation?
  2. What happens if there are duplicates in the gene_symbol column but not in the var_names?
nictru commented 3 months ago

@fbnrst, is this approximately what you imagined? I didn't test it yet, but if you want you can give it a go and let me know if something needs to be changed

github-actions[bot] commented 3 months ago

nf-core lint overall result: Passed :white_check_mark: :warning:

Posted for pipeline commit dcedb58

+| ✅ 207 tests passed       |+
!| ❗  13 tests had warnings |!
### :heavy_exclamation_mark: Test warnings: * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File not found: `conf/igenomes.config` * [readme](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/readme) - README contains the placeholder `zenodo.XXXXXXX`. This should be replaced with the zenodo doi (after the first release). * [pipeline_todos](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/pipeline_todos) - TODO string in `nextflow.config`: _Specify your pipeline's command line flags_ * [pipeline_todos](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/pipeline_todos) - TODO string in `README.md`: _Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file._ * [pipeline_todos](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/pipeline_todos) - TODO string in `README.md`: _Add bibliography of tools and data used in your pipeline_ * [pipeline_todos](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/pipeline_todos) - TODO string in `main.nf`: _Optionally add in-text citation tools to this list._ * [pipeline_todos](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/pipeline_todos) - TODO string in `main.nf`: _Optionally add bibliographic entries to this list._ * [pipeline_todos](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/pipeline_todos) - TODO string in `main.nf`: _Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!_ * [pipeline_todos](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/pipeline_todos) - TODO string in `methods_description_template.yml`: _#Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline_ * [pipeline_todos](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/pipeline_todos) - TODO string in `ci.yml`: _You can customise CI pipeline run tests as required_ * [pipeline_todos](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/pipeline_todos) - TODO string in `awsfulltest.yml`: _You can customise AWS full pipeline tests as required_ * [pipeline_todos](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/pipeline_todos) - TODO string in `base.config`: _Check the defaults for all processes_ * [pipeline_todos](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/pipeline_todos) - TODO string in `base.config`: _Customise requirements for specific processes._ ### :white_check_mark: Tests passed: * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `.gitattributes` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `.gitignore` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `.nf-core.yml` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `.editorconfig` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `.prettierignore` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `.prettierrc.yml` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `CHANGELOG.md` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `CITATIONS.md` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `CODE_OF_CONDUCT.md` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `LICENSE` or `LICENSE.md` or `LICENCE` or `LICENCE.md` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `nextflow_schema.json` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `nextflow.config` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `README.md` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `.github/.dockstore.yml` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `.github/CONTRIBUTING.md` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `.github/ISSUE_TEMPLATE/bug_report.yml` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `.github/ISSUE_TEMPLATE/config.yml` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `.github/ISSUE_TEMPLATE/feature_request.yml` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `.github/PULL_REQUEST_TEMPLATE.md` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `.github/workflows/branch.yml` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `.github/workflows/ci.yml` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `.github/workflows/linting_comment.yml` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `.github/workflows/linting.yml` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `assets/email_template.html` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `assets/email_template.txt` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `assets/sendmail_template.txt` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `assets/nf-core-scdownstream_logo_light.png` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `conf/modules.config` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `conf/test.config` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `conf/test_full.config` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `docs/images/nf-core-scdownstream_logo_light.png` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `docs/images/nf-core-scdownstream_logo_dark.png` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `docs/output.md` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `docs/README.md` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `docs/README.md` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `docs/usage.md` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `main.nf` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `assets/multiqc_config.yml` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `conf/base.config` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `.github/workflows/awstest.yml` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `.github/workflows/awsfulltest.yml` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File found: `modules.json` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File not found check: `.github/ISSUE_TEMPLATE/bug_report.md` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File not found check: `.github/ISSUE_TEMPLATE/feature_request.md` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File not found check: `.github/workflows/push_dockerhub.yml` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File not found check: `.markdownlint.yml` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File not found check: `.nf-core.yaml` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File not found check: `.yamllint.yml` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File not found check: `bin/markdown_to_html.r` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File not found check: `conf/aws.config` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File not found check: `docs/images/nf-core-scdownstream_logo.png` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File not found check: `lib/Checks.groovy` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File not found check: `lib/Completion.groovy` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File not found check: `lib/NfcoreTemplate.groovy` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File not found check: `lib/Utils.groovy` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File not found check: `lib/Workflow.groovy` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File not found check: `lib/WorkflowMain.groovy` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File not found check: `lib/WorkflowScdownstream.groovy` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File not found check: `parameters.settings.json` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File not found check: `pipeline_template.yml` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File not found check: `Singularity` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File not found check: `lib/nfcore_external_java_deps.jar` * [files_exist](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_exist) - File not found check: `.travis.yml` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable found: `manifest.name` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable found: `manifest.nextflowVersion` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable found: `manifest.description` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable found: `manifest.version` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable found: `manifest.homePage` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable found: `timeline.enabled` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable found: `trace.enabled` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable found: `report.enabled` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable found: `dag.enabled` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable found: `process.cpus` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable found: `process.memory` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable found: `process.time` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable found: `params.outdir` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable found: `params.input` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable found: `params.validationShowHiddenParams` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable found: `params.validationSchemaIgnoreParams` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable found: `manifest.mainScript` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable found: `timeline.file` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable found: `trace.file` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable found: `report.file` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable found: `dag.file` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable (correctly) not found: `params.nf_required_version` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable (correctly) not found: `params.container` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable (correctly) not found: `params.singleEnd` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable (correctly) not found: `params.igenomesIgnore` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable (correctly) not found: `params.name` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable (correctly) not found: `params.enable_conda` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config ``timeline.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config ``report.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config ``trace.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config ``dag.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config ``manifest.name`` began with ``nf-core/`` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable ``manifest.homePage`` began with https://github.com/nf-core/ * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config ``dag.file`` ended with ``.html`` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config variable ``manifest.nextflowVersion`` started with >= or !>= * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config ``manifest.version`` ends in ``dev``: ``1.0dev`` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config `params.custom_config_version` is set to `master` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config `params.custom_config_base` is set to `https://raw.githubusercontent.com/nf-core/configs/master` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Lines for loading custom profiles found * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - nextflow.config contains configuration profile `test` * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.memory_scale= 1 * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.skip_qc= false * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.ambient_removal= decontx * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.doublet_detection= scrublet * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.doublet_detection_threshold= 1 * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.force_obs_cols= * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.integration_methods= scvi * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.integration_hvgs= 10000 * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.clustering_resolutions= 0.5,1.0 * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.celltypist_model= * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.cellbender_epochs= 150 * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.var_aggr_method= mean * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.custom_config_version= master * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.max_cpus= 16 * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.max_memory= 128.GB * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.max_time= 240.h * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.publish_dir_mode= copy * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.max_multiqc_email_size= 25.MB * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.validate_params= true * [nextflow_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nextflow_config) - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nf-core/test-datasets/ * [files_unchanged](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_unchanged) - `.gitattributes` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_unchanged) - `.prettierrc.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_unchanged) - `CODE_OF_CONDUCT.md` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_unchanged) - `LICENSE` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_unchanged) - `.github/.dockstore.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_unchanged) - `.github/CONTRIBUTING.md` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_unchanged) - `.github/ISSUE_TEMPLATE/bug_report.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_unchanged) - `.github/ISSUE_TEMPLATE/config.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_unchanged) - `.github/ISSUE_TEMPLATE/feature_request.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_unchanged) - `.github/PULL_REQUEST_TEMPLATE.md` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_unchanged) - `.github/workflows/branch.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_unchanged) - `.github/workflows/linting_comment.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_unchanged) - `.github/workflows/linting.yml` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_unchanged) - `assets/email_template.html` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_unchanged) - `assets/email_template.txt` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_unchanged) - `assets/sendmail_template.txt` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_unchanged) - `assets/nf-core-scdownstream_logo_light.png` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_unchanged) - `docs/images/nf-core-scdownstream_logo_light.png` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_unchanged) - `docs/images/nf-core-scdownstream_logo_dark.png` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_unchanged) - `docs/README.md` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_unchanged) - `.gitignore` matches the template * [files_unchanged](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/files_unchanged) - `.prettierignore` matches the template * [actions_ci](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/actions_ci) - '.github/workflows/ci.yml' is triggered on expected events * [actions_ci](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/actions_ci) - '.github/workflows/ci.yml' checks minimum NF version * [actions_awstest](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/actions_awstest) - '.github/workflows/awstest.yml' is triggered correctly * [actions_awsfulltest](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/actions_awsfulltest) - `.github/workflows/awsfulltest.yml` is triggered correctly * [actions_awsfulltest](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/actions_awsfulltest) - `.github/workflows/awsfulltest.yml` does not use `-profile test` * [readme](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/readme) - README Nextflow minimum version badge matched config. Badge: `23.04.0`, Config: `23.04.0` * [pipeline_name_conventions](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/pipeline_name_conventions) - Name adheres to nf-core convention * [template_strings](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/template_strings) - Did not find any Jinja template strings (199 files) * [schema_lint](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/schema_lint) - Schema lint passed * [schema_lint](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/schema_lint) - Schema title + description lint passed * [schema_lint](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/schema_lint) - Input mimetype lint passed: 'text/csv' * [schema_params](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/schema_params) - Schema matched params returned from nextflow config * [system_exit](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/system_exit) - No `System.exit` calls found * [actions_schema_validation](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/actions_schema_validation) - Workflow validation passed: awstest.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/actions_schema_validation) - Workflow validation passed: branch.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/actions_schema_validation) - Workflow validation passed: fix-linting.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/actions_schema_validation) - Workflow validation passed: linting.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/actions_schema_validation) - Workflow validation passed: clean-up.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/actions_schema_validation) - Workflow validation passed: ci.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/actions_schema_validation) - Workflow validation passed: linting_comment.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/actions_schema_validation) - Workflow validation passed: awsfulltest.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/actions_schema_validation) - Workflow validation passed: download_pipeline.yml * [actions_schema_validation](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/actions_schema_validation) - Workflow validation passed: release-announcements.yml * [merge_markers](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/merge_markers) - No merge markers found in pipeline files * [modules_json](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_json) - Only installed modules found in `modules.json` * [multiqc_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/multiqc_config) - `assets/multiqc_config.yml` found and not ignored. * [multiqc_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/multiqc_config) - `assets/multiqc_config.yml` contains `report_section_order` * [multiqc_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/multiqc_config) - `assets/multiqc_config.yml` contains `export_plots` * [multiqc_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/multiqc_config) - `assets/multiqc_config.yml` contains `report_comment` * [multiqc_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/multiqc_config) - `assets/multiqc_config.yml` follows the ordering scheme of the minimally required plugins. * [multiqc_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/multiqc_config) - `assets/multiqc_config.yml` contains a matching 'report_comment'. * [multiqc_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/multiqc_config) - `assets/multiqc_config.yml` contains 'export_plots: true'. * [modules_structure](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_structure) - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL' * [base_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/base_config) - `conf/base.config` found and not ignored. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `conf/modules.config` found and not ignored. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `ADATA_UNIFY` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `QC_RAW` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `QC_FILTERED` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `CELDA_DECONTX` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `SOUPX` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `SCANPY_FILTER` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `SCVITOOLS_SOLO` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `SCANPY_SCRUBLET` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `DOUBLETDETECTION` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `SCDS` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `DOUBLET_REMOVAL` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `ADATA_MERGE` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `ADATA_UPSETGENES` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `SCANPY_HVGS` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `SCVITOOLS_SCVI` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `SCVITOOLS_SCANVI` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `INTEGRATION_HARMONY` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `INTEGRATION_BBKNN` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `SCANPY_COMBAT` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `CELLTYPES_CELLTYPIST` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `SCANPY_NEIGHBORS` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `SCANPY_LEIDEN` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `SCANPY_UMAP` found in `conf/modules.config` and Nextflow scripts. * [modules_config](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/modules_config) - `MULTIQC` found in `conf/modules.config` and Nextflow scripts. * [nfcore_yml](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nfcore_yml) - Repository type in `.nf-core.yml` is valid: `pipeline` * [nfcore_yml](https://nf-co.re/tools/docs/2.14.1/pipeline_lint_tests/nfcore_yml) - nf-core version in `.nf-core.yml` is set to the latest version: `2.14.1` ### Run details * nf-core/tools version 2.14.1 * Run at `2024-08-28 14:20:00`
fbnrst commented 3 months ago

This is how I imagined it, and I am running a test right now. But your points are very valid and it is not clear to me how duplicate var_names should be handled. I checked celltypist a little, and their approach seems to be that you can convert the models using a mapping file that maps from IDs to symbols, see e.g. https://github.com/Teichlab/celltypist/issues/87 They also provide some mappings from Ensembl to gene symbols: https://github.com/Teichlab/celltypist/tree/main/celltypist/data/samples

There, they also provide files for other species and of course this is another relevant point, celltypist was developed having human in mind.

So: What about using the mapping file instead of providing gene symbols column? Also, after realising the species issue, it might be good to be able to skip cell typing as well, sometimes, without a good reference, this might not make sense.

nictru commented 3 months ago

Celltypist is already optional, it will only be executed if a celltypist model is provided as a parameter

fbnrst commented 3 months ago

My test run failed because the gene_symbol column was not available. I checked the combine/merge/merged_outer.h5ad in the outputs, which I believe is read by celltypist. I think the concatenation needs to be changed like this by adding a merge argument:

adata_outer = ad.concat(adatas, join="outer", index_unique="-", merge="unique")

https://github.com/fbnrst/scdownstream/blame/1d9f724e39a12211b5d36950e06656dc46e913ec/modules/local/adata/merge/templates/merge.py#L32

I am currently running a test on this again.

But I realised another thing: When doing an outer merge 0s are filled in the count matrix (see the warning under notes here: https://anndata.readthedocs.io/en/latest/generated/anndata.concat.html). I think this is not generally correct, because missing data for that gene does not mean it was not expressed. I wonder whether it would be better to run celltypist on the inner join, or alternativly, to run celltypist on the individual samples before merging.

fbnrst commented 3 months ago

I created #74 to fix some little things, with those, I managed to run the pipeline using gene symbols for cell typist

nictru commented 3 months ago

When doing an outer merge 0s are filled in the count matrix (see the warning under notes here: https://anndata.readthedocs.io/en/latest/generated/anndata.concat.html). I think this is not generally correct, because missing data for that gene does not mean it was not expressed.

True. The outer join is generally done so that people don't wonder why their favorite genes are not present in the output object. Filling with NaN would hugely increase dataset size, as it would need to be explicitly stored in sparse matrices.

Running celltypist per dataset before merging would a good solution I think