rbturnbull / orthoflow

Orthoflow is a workflow for phylogenetic inference of genome-scale datasets of protein-coding genes.
https://rbturnbull.github.io/orthoflow/
Apache License 2.0
10 stars 2 forks source link

speed up filter_orthofinder.py script #20

Closed smutch closed 2 years ago

smutch commented 2 years ago

This script is really slow due to heavy reading and writing of potentially tiny files. Most likely we are IO bound here which means we should get a good speed up by simply using threads / asyncio. We could probably replace the system call to grep as well to remove some fork/join costs for each file being checked.

smutch commented 2 years ago

This will be fixed when #21 is merged.