nf-core / eager

A fully reproducible and state-of-the-art ancient DNA analysis pipeline
https://nf-co.re/eager
MIT License
140 stars 80 forks source link

Speed and memory optimization of `extract_map_reads.py` #905

Closed maxibor closed 2 years ago

maxibor commented 2 years ago

One of the long awaited improvement is finally there, using PySam and cleverer set operations, the host_removal step is now much much leaner on memory and runtime

For example, for a 5.1 GB bam file (~95 million reads), and its associated forward (5.2GB) and reverse (5.4 GB) gzip compressed fastq files, it took only 19m21s and 90 MB of memory.

The CLI stays the same

Closes #789

PR checklist

maxibor commented 2 years ago

This should fix #789

jfy133 commented 2 years ago

Running against the CMC data now

jfy133 commented 2 years ago

Hell yeah @maxibor !

image

github-actions[bot] commented 2 years ago

YAML linting is failing

To keep the code consistent with lots of contributors, we run automated code consistency checks. To fix this CI test, please run:

Once you push these changes the test should pass, and you can hide this comment :+1:

We highly recommend setting up yaml-lint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!

Thanks again for your contribution!

github-actions[bot] commented 2 years ago

Markdown linting is failing

To keep the code consistent with lots of contributors, we run automated code consistency checks. To fix this CI test, please run:

Once you push these changes the test should pass, and you can hide this comment :+1:

We highly recommend setting up markdownlint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!

Thanks again for your contribution!