sanger-tol / genomenote

Nextflow DSL2 pipeline to generate a Genome Note, including assembly statistics, quality metrics, and Hi-C contact maps. This workflow is part of the Tree of Life production suite.
https://pipelines.tol.sanger.ac.uk/genomenote
MIT License
24 stars 6 forks source link

Resource optimisation #90

Closed muffato closed 1 year ago

muffato commented 1 year ago

Like in https://github.com/sanger-tol/readmapping/pull/82 the goal is to stop using the process_* labels and instead optimise the resource requests of every process.

I'm using the same dataset: 10 genomes of increasing size, with 1 Hi-C library and 1 PacBio library each.

  Fasta size (bytes) PacBio size (# reads) Hi-C (# reads)
GCA_939531405.1 13,824,461 1,546,435 955,654,834
GCA_937625935.1 26,683,271 189,202 980,890,138
GCA_951394315.1 58,010,196 1,965,084 704,258,466
GCA_947172415.1 118,858,594 799,796 87,833,110
GCA_910589235.2 232,212,321 1,586,931 727,465,652
GCA_949987625.1 417,566,504 2,211,570 705,705,280
GCA_946406115.1 810,357,340 1,872,695 842,629,084
GCA_963513935.1 1,803,897,959 7,338,871 3,305,634,916
GCA_951213105.1 3,609,437,155 1,121,856 3,127,898,040
GCA_946902985.2 9,152,113,672 1,537,548 886,707,886

I found much less correlation than in the read-mapping pipeline. The only input size that I found useful was the genome size, now collected at the beginning of the pipeline and added to the meta map. There is some correlation between the number of Hi-C reads and some process runtime, but not memory usage. Since runtime estimates don't need to be very accurate (really, it's only normal/long/week that matters), I don't even pull that input size.

I am using helper functions to grow values (like the number of CPUs) in a logarithm fashion. In effect, this is to limit the increase of the number of CPUs, especially as the advantage of multi-threading tends to decrease with a higher number of threads.

Also:

  1. I could replace the GrabFiles process with some Groovy magic, as per https://community.seqera.io/t/is-it-bad-practice-to-try-and-pluck-files-from-an-element-in-a-channel-that-is-a-directory-with-channel-manipulation/224/2 . This saves 1 LSF job.
  2. I adjusted the GNU_SORT parameters to fix #91

In this PR, the new resource requirements make every process succeed at the first attempt. The formulas are the lowest legible-ish correlations I could find. I will investigate later lowering the requirements even more, with the hope that the savings on some processes will balance out others having to be rerun.

Metric Before After Improvement
Total memory requested (GB) 3,660.0 997.1 ÷3.7
Memory efficiency (used/requested, %) 21.2 78.0
Total memory reservation (GB-hours) 2,792.4 2,405.4 ÷1.2
Memory reservation efficiency (used/requested, %) 86.0 89.4
Total CPUs requested (n) 610.0 510.0 ÷1.2
CPU efficiency (used/requested, %) 47.6 60.5
Total CPU reservation (CPU-hours) 465.4 332.0 ÷1.4
CPU reservation efficiency (used/requested, %) 62.0 86.3
Job failures (%) 0.5 0.0

PR checklist

github-actions[bot] commented 1 year ago

nf-core lint overall result: Passed :white_check_mark: :warning:

Posted for pipeline commit ca96b15

+| ✅ 132 tests passed       |+
#| ❔  20 tests were ignored |#
!| ❗   1 tests had warnings |!
### :heavy_exclamation_mark: Test warnings: * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config ``manifest.version`` should end in ``dev``: ``'1.1.0'`` ### :grey_question: Tests ignored: * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `assets/nf-core-genomenote_logo_light.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `docs/images/nf-core-genomenote_logo_light.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `docs/images/nf-core-genomenote_logo_dark.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/ISSUE_TEMPLATE/config.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/workflows/awstest.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `.github/workflows/awsfulltest.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File is ignored: `conf/igenomes.config` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable ignored: `manifest.name` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable ignored: `manifest.homePage` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `LICENSE` or `LICENSE.md` or `LICENCE` or `LICENCE.md` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/CONTRIBUTING.md` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/ISSUE_TEMPLATE/bug_report.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `.github/ISSUE_TEMPLATE/config.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `.github/workflows/linting.yml` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `assets/nf-core-genomenote_logo_light.png` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `docs/images/nf-core-genomenote_logo_light.png` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File does not exist: `docs/images/nf-core-genomenote_logo_dark.png` * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - File ignored due to lint config: `lib/NfcoreTemplate.groovy` * [actions_ci](https://nf-co.re/tools-docs/lint_tests/actions_ci.html) - actions_ci * [actions_awstest](https://nf-co.re/tools-docs/lint_tests/actions_awstest.html) - 'awstest.yml' workflow not found: `/home/runner/work/genomenote/genomenote/.github/workflows/awstest.yml` ### :white_check_mark: Tests passed: * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.gitattributes` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.gitignore` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.nf-core.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.editorconfig` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.prettierignore` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.prettierrc.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CHANGELOG.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CITATIONS.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CODE_OF_CONDUCT.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `CODE_OF_CONDUCT.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `LICENSE` or `LICENSE.md` or `LICENCE` or `LICENCE.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `nextflow_schema.json` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `nextflow.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `README.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/.dockstore.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/CONTRIBUTING.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/ISSUE_TEMPLATE/bug_report.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/ISSUE_TEMPLATE/feature_request.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/PULL_REQUEST_TEMPLATE.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/branch.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/ci.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/linting_comment.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `.github/workflows/linting.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/email_template.html` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/email_template.txt` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/sendmail_template.txt` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/modules.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/test.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/test_full.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/output.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/README.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/README.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `docs/usage.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/nfcore_external_java_deps.jar` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/NfcoreSchema.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/NfcoreTemplate.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/Utils.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/WorkflowMain.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `main.nf` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `assets/multiqc_config.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `conf/base.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `lib/WorkflowGenomenote.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `modules.json` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File found: `pyproject.toml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `Singularity` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `parameters.settings.json` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.nf-core.yaml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `bin/markdown_to_html.r` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `conf/aws.config` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.github/workflows/push_dockerhub.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.github/ISSUE_TEMPLATE/bug_report.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.github/ISSUE_TEMPLATE/feature_request.md` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `docs/images/nf-core-genomenote_logo.png` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.markdownlint.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.yamllint.yml` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `lib/Checks.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `lib/Completion.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `lib/Workflow.groovy` * [files_exist](https://nf-co.re/tools-docs/lint_tests/files_exist.html) - File not found check: `.travis.yml` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `manifest.nextflowVersion` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `manifest.description` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `manifest.version` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `timeline.enabled` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `trace.enabled` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `report.enabled` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `dag.enabled` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `process.cpus` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `process.memory` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `process.time` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `params.outdir` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `params.input` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `params.show_hidden_params` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `params.schema_ignore_params` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `manifest.mainScript` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `timeline.file` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `trace.file` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `report.file` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable found: `dag.file` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.nf_required_version` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.container` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.singleEnd` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.igenomesIgnore` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.name` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable (correctly) not found: `params.enable_conda` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config ``timeline.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config ``report.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config ``trace.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config ``dag.enabled`` had correct value: ``true`` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config ``dag.file`` ended with ``.html`` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config variable ``manifest.nextflowVersion`` started with >= or !>= * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config `params.custom_config_version` is set to `master` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Config `params.custom_config_base` is set to `https://raw.githubusercontent.com/nf-core/configs/master` * [nextflow_config](https://nf-co.re/tools-docs/lint_tests/nextflow_config.html) - Lines for loading custom profiles found * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.gitattributes` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.prettierrc.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/.dockstore.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/ISSUE_TEMPLATE/feature_request.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/PULL_REQUEST_TEMPLATE.md` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/workflows/branch.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.github/workflows/linting_comment.yml` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `assets/email_template.html` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `assets/email_template.txt` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `assets/sendmail_template.txt` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `docs/README.md` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `lib/nfcore_external_java_deps.jar` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `lib/NfcoreSchema.groovy` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.gitignore` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `.prettierignore` matches the template * [files_unchanged](https://nf-co.re/tools-docs/lint_tests/files_unchanged.html) - `pyproject.toml` matches the template * [readme](https://nf-co.re/tools-docs/lint_tests/readme.html) - README Nextflow minimum version badge matched config. Badge: `22.10.1`, Config: `22.10.1` * [readme](https://nf-co.re/tools-docs/lint_tests/readme.html) - README Zenodo placeholder was replaced with DOI. * [pipeline_todos](https://nf-co.re/tools-docs/lint_tests/pipeline_todos.html) - No TODO strings found * [pipeline_name_conventions](https://nf-co.re/tools-docs/lint_tests/pipeline_name_conventions.html) - Name adheres to nf-core convention * [template_strings](https://nf-co.re/tools-docs/lint_tests/template_strings.html) - Did not find any Jinja template strings (105 files) * [schema_lint](https://nf-co.re/tools-docs/lint_tests/schema_lint.html) - Schema lint passed * [schema_lint](https://nf-co.re/tools-docs/lint_tests/schema_lint.html) - Schema title + description lint passed * [schema_lint](https://nf-co.re/tools-docs/lint_tests/schema_lint.html) - Input mimetype lint passed: 'text/csv' * [schema_params](https://nf-co.re/tools-docs/lint_tests/schema_params.html) - Schema matched params returned from nextflow config * [system_exit](https://nf-co.re/tools-docs/lint_tests/system_exit.html) - No `System.exit` calls found * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: branch.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: linting_comment.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: ci.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: clean-up.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: linting.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: sanger_test_full.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: fix-linting.yml * [actions_schema_validation](https://nf-co.re/tools-docs/lint_tests/actions_schema_validation.html) - Workflow validation passed: sanger_test.yml * [merge_markers](https://nf-co.re/tools-docs/lint_tests/merge_markers.html) - No merge markers found in pipeline files * [modules_json](https://nf-co.re/tools-docs/lint_tests/modules_json.html) - Only installed modules found in `modules.json` * [multiqc_config](https://nf-co.re/tools-docs/lint_tests/multiqc_config.html) - 'assets/multiqc_config.yml' follows the ordering scheme of the minimally required plugins. * [multiqc_config](https://nf-co.re/tools-docs/lint_tests/multiqc_config.html) - 'assets/multiqc_config.yml' contains 'export_plots: true'. * [modules_structure](https://nf-co.re/tools-docs/lint_tests/modules_structure.html) - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL' ### Run details * nf-core/tools version 2.8 * Run at `2023-11-14 13:36:34`