Speed and memory optimization of `extract_map_reads.py`

nf-core / eager

A fully reproducible and state-of-the-art ancient DNA analysis pipeline

https://nf-co.re/eager

MIT License

140 stars 80 forks source link

Speed and memory optimization of `extract_map_reads.py` #905

Closed maxibor closed 2 years ago

maxibor commented 2 years ago

One of the long awaited improvement is finally there, using PySam and cleverer set operations, the host_removal step is now much much leaner on memory and runtime

For example, for a 5.1 GB bam file (~95 million reads), and its associated forward (5.2GB) and reverse (5.4 GB) gzip compressed fastq files, it took only 19m21s and 90 MB of memory.

The CLI stays the same

Closes #789

PR checklist

[x] This comment contains a description of changes (with reason).
[x] Ensure the test suite passes (nextflow run . -profile test,docker).
[ ] CHANGELOG.md is updated.

maxibor commented 2 years ago

This should fix #789

jfy133 commented 2 years ago

Running against the CMC data now

jfy133 commented 2 years ago

Hell yeah @maxibor !

github-actions[bot] commented 2 years ago

YAML linting is failing

To keep the code consistent with lots of contributors, we run automated code consistency checks. To fix this CI test, please run:

Install yaml-lint
- Install npm then install yaml-lint (npm install -g yaml-lint)
Fix the markdown errors
- Run the test locally: yamllint $(find . -type f -name "*.yml" -o -name "*.yaml")
- Fix any reported errors in your YAML files

Once you push these changes the test should pass, and you can hide this comment :+1:

We highly recommend setting up yaml-lint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!

Thanks again for your contribution!

github-actions[bot] commented 2 years ago

Markdown linting is failing

To keep the code consistent with lots of contributors, we run automated code consistency checks. To fix this CI test, please run:

Install markdownlint-cli
- On Mac: brew install markdownlint-cli
- Everything else: Install npm then install markdownlint-cli (npm install -g markdownlint-cli)
Fix the markdown errors
- Automatically: markdownlint . --config .github/markdownlint.yml --fix
- Manually resolve anything left from markdownlint . --config .github/markdownlint.yml

Once you push these changes the test should pass, and you can hide this comment :+1:

We highly recommend setting up markdownlint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!

Thanks again for your contribution!