phac-nml / staramr

Scans genome contigs against the ResFinder, PlasmidFinder, and PointFinder databases.
Apache License 2.0
112 stars 25 forks source link

Updated Scheme and Sequence type to summary and optimized MLST to use thread pools #85

Closed jennifertran closed 4 years ago

jennifertran commented 5 years ago

Based on Issue #81 and #79

Problem

Based on the issue #81 , since the mlst --threads command wasn't doing what we expected to do, there was still a huge bottleneck within the code since each file has to go through mlst one at a time, it wasn't utilizing it resources enough.

Solution

Partition the files evenly based on the number of threads available, instantiate the threads in a thread pool and assign an mlst instance to a thread.

Implementation

The implementation is pretty similar to how blast does it. It uses the ThreadPoolExecutor class to assign the thread to a process.

Testing

Been checking the time system in staramr to see if there was a significant performance change given 50 files and use the top command to see if the CPU's were being utilized.

apetkau commented 5 years ago

This is awesome @jennifertran, thanks. I'll try to find some time to look over this soon.

apetkau commented 4 years ago

This is awesome. Thanks so much @jennifertran. It works great now, and is much much faster. :+1: