Open hdante opened 2 months ago
Hello, I imagine there are 2 ways to fix the reduction operation, one is using OpenMP's single threaded loop:
#pragma omp single
for (int i = 0; i < dimzg; i++) {
(...)
The second is moving the loop outside the omp parallel region and executing a standard C++ single threaded loop.
I'm not sure how to compare the performance impact of either fix, it might be easier to try both.
Hello, there's a reduction loop that's being executed in parallel, but it's covering the whole vector, including the data of all threads, per thread. Instead the reduction should, for example, be executed in a single thread (or maybe reduced with a tree). I think this also means that there's a race because the loop does a read-modify-write sequence on the
chi2
andind
vectors.The parallelized reduction causes every thread to execute the same code and, if confirmed, a race condition would cause the
chi2
vector not being completely minimized. The race condition might be confirmed with a dataset that exposes the race and then comparing a multi-threaded library with a single-threaded one.https://github.com/lephare-photoz/lephare/blob/dbe015b438c21b515c34fd6f87b94859fffb9ba9/src/lib/onesource.cpp#L1002
Before submitting Please check the following: