twolinin / longphase

GNU General Public License v3.0
98 stars 6 forks source link

ModCall Multithreading Optimization #43

Closed sloth-eat-pudding closed 6 months ago

sloth-eat-pudding commented 6 months ago

Summary

Optimized modcall runtime and ensured multithread safety.

Changes

  1. MethFastaParser Utilizing New Structure:
    • Revised the storage structure of references fasta to include chromosome length information, facilitating chromosome processing in the correct numerical order (chr1, chr2, chr3) instead of lexicographical order (chr1, chr11, chr12).
    • This change not only eliminates the need to recalculate chromosome lengths but also enhances execution efficiency in a multithreaded environment.
  2. Modifications in MethBamParser:
    • Introduced an additional parameter int numThreads in the function detectMeth. This change allows for dynamic allocation of threads based on the processing requirements, improving the handling of multi-threaded operations.
  3. Thread Safety Measures:
    • Split the writeResultVCF function into two parts: exportResult and writeResultVCF.
    • exportResult: Handles the processing results for each chromosome, preparing data for VCF file writing.
    • writeResultVCF: Tasked with the actual writing of data into the VCF file, ensuring the integrity and sequentiality of output.
  4. Changes in ModCallProcess:
    • New Function - setModcallNumThreads :Implemented to intelligently allocate threads between chromosome processing and BAM parsing tasks.

Testing

This test compares the run times with the develop branch (commit f46509bba8bcb27e9812fbc3aacb6738f351df75), performing modcall at 10x to 60x scale using 20 threads. The time format is mm:ss.

Test Condition Run Time Before Optimization Maximum Memory Before Optimization (GB) Run Time After Optimization Maximum Memory After Optimization (GB)
HG002 ONT 10x 05:02.2 11.3 01:21.1 26.1
HG002 ONT 20x 09:21.7 15.3 03:16.8 39.7
HG002 ONT 30x 11:18.1 18.9 02:56.3 52.6
HG002 ONT 40x 20:05.3 22.6 05:32.6 65.6
HG002 ONT 50x 23:54.2 26.2 06:12.8 78.3
HG002 ONT 60x 21:51.5 29.8 05:37.6 91.1