kosta777 / parallel-genomeseq

Parallelization of popular genome sequencing algorithms
4 stars 1 forks source link

Library options for medium to fine grained parallelization #10

Open huanglangwen opened 4 years ago

huanglangwen commented 4 years ago

There are several parallelization libraries out there. We need to agree on one to avoid conflict.

  1. The most common option is OpenMP with two possible versions:
    • OpenMP 5.0, with task based parallelization, full C++17 support, requirement GCC>=9.0*
    • OpenMP 4.5, with simd parallelization, full C++11 support, requirement GCC>=6.3.0
  2. Intel TBB library, with simd parallelization, task based parallelization, parallel STL (C++17) support, requirement C++11 compiler

OpenMP does parallelizations through compiler annotations like:

#pragma omp parallel for private(i, j) shared(A, n)
for (i = 0; i < n; ++i) 
  for (j = 1; j < n; ++j)  
    A[i][j] += A[i][j-1];

While TBB does parallelizations by passing function objects (often lambda):

parallel_for( blocked_range<size_t>(0,n), 
      [=](const blocked_range<size_t>& r) {
                      for(size_t i=r.begin(); i!=r.end(); ++i) 
                          Foo(a[i]); 
                  });

*: the newest GCC on Euler cluster is 6.3.0, need to compile by ourselves if newer than that.

References:

  1. OpenMP Tutorial: https://openmpcon.org/wp-content/uploads/openmpcon2017/Tutorial1-OpenMP-Core-Hands-On.pdf
  2. OpenMP4.5 Example: https://www.openmp.org/wp-content/uploads/openmp-examples-4.5.0.pdf
  3. OpenMP4.5 Reference Guide: https://www.openmp.org/wp-content/uploads/OpenMP-4.5-1115-CPP-web.pdf
  4. OpenMP5.0 Reference Guide: http://homepage.physics.uiowa.edu/~ghowes/teach/phys5905/manuals/OpenMP5.0Reference.pdf
  5. ProTBB (openaccess book): https://www.apress.com/gp/book/9781484243978 or https://link.springer.com/book/10.1007/978-1-4842-4398-5
huanglangwen commented 4 years ago

I’m going to create a new subclass of LocalAligner to parallelize the computation by splitting the reference sequence:

class ParallelLocalAligner: public LocalAligner {
    ParallelLocalAligner(std::function lafactory, int nthreads, std::stringview a, std::stringview b);
    //lafactory(a,b_piece) -> LocalAligner object
};

Things to do: