lephare-photoz / lephare

LePHARE is a code for calculating photometric redshifts.
MIT License
5 stars 1 forks source link

Reduction operation being executed in parallel with read-modify-write operation in onesource::generatePDF() #197

Open hdante opened 2 months ago

hdante commented 2 months ago

Hello, there's a reduction loop that's being executed in parallel, but it's covering the whole vector, including the data of all threads, per thread. Instead the reduction should, for example, be executed in a single thread (or maybe reduced with a tree). I think this also means that there's a race because the loop does a read-modify-write sequence on the chi2 and ind vectors.

The parallelized reduction causes every thread to execute the same code and, if confirmed, a race condition would cause the chi2 vector not being completely minimized. The race condition might be confirmed with a dataset that exposes the race and then comparing a multi-threaded library with a single-threaded one.

https://github.com/lephare-photoz/lephare/blob/dbe015b438c21b515c34fd6f87b94859fffb9ba9/src/lib/onesource.cpp#L1002

Before submitting Please check the following:

hdante commented 2 months ago

Hello, I imagine there are 2 ways to fix the reduction operation, one is using OpenMP's single threaded loop:

#pragma omp single
      for (int i = 0; i < dimzg; i++) {
(...)

The second is moving the loop outside the omp parallel region and executing a standard C++ single threaded loop.

I'm not sure how to compare the performance impact of either fix, it might be easier to try both.