twolinin / longphase

GNU General Public License v3.0
98 stars 6 forks source link

Phase Multithreading Optimization #35

Closed sloth-eat-pudding closed 7 months ago

sloth-eat-pudding commented 7 months ago

Summary

Optimized phase runtime and ensured multithread safety.

Changes

  1. Makefile Adjustments
    • Added -fopenmp flag in CPPFLAGS to enable OpenMP support, which allows for efficient multi-threading in the C++ components.
  2. Modifications in ParsingBam.cpp, ParsingBam.h
    • Introduced an additional parameter int &numThreads in the function direct_detect_alleles. This change allows for dynamic allocation of threads based on the processing requirements, improving the handling of multi-threaded operations.
  3. Updates in Phasing.cpp
    • Modified the default value of --threads argument to 0. This change signifies that, by default, the program will utilize all available threads, optimizing resource usage.
  4. Major Refactoring in PhasingProcess.cpp
    • Implemented a new function setNumThreads for intelligent distribution of threads between chromosome processing and BAM parsing, enhancing parallel processing efficiency.
    • Established a ChrPhasingResult map to handle phasing results in a thread-safe manner.
    • Merged individual chromosome phasing results into a single mergedPhasingResult, streamlining the result aggregation process.

Testing

This test compares the run times with the develop branch (commit 4d3b42bc5d29cb8e7ae6fa259eca82f92b6bcf9b), performing phasing at 10x to 60x scale. The time format is mm:ss.

Test Condition Run Time Before Optimization Run Time After Optimization
HG002 ONT 10x 02:51.00 00:53.74
HG002 ONT 20x 05:17.32 01:58.43
HG002 ONT 30x 04:35.19 01:26.42
HG002 ONT 40x 11:05.72 03:31.45
HG002 ONT 50x 13:07.73 03:54.74
HG002 ONT 60x 07:06.75 02:00.00