Use new imglib2 parallelization approach

maarzt commented 4 years ago

In this PR I change imglib2-algorithm to use the parallelization framework introduced in https://github.com/imglib/imglib2/pull/269! I want to find out it the parallelization framework is useful. And that's the result:

Yes! :tada: All multi-threading in imglib2-algorithm can be replaced by the parallelization framework.

Most multi-threading code can be replaced with multi-threaded LoopBuilder or IterableLoopBuilder. This also leads to extreme simplification of the code. IterableLoopBuilder is part of this PR, it's like LoopBuilder but has an IterableInterval as first argument.

Some methods, that where previously single threaded, now run multi-threaded. It's still possible to execute them single threaded, use Parallelization.runSingleThreaded( () -> ... ). We might either just stick with this. Or give multi-threaded functions a different name. And provide a backward function. These affected methods are:

ConnectedComponents.labelAllConnectedComponents( final RandomAccessible< T > input, final ImgLabeling< L, I > labeling, final Iterator< L > labelGenerator, final StructuringElement se )
ConnectedComponents.labelAllConnectedComponents(final RandomAccessible< T > input, final RandomAccessibleInterval< L > output, final StructuringElement se )
TensorEigenValues.calculateEigenValuesSymmetric(final RandomAccessibleInterval< T > tensor, final RandomAccessibleInterval< U > eigenvalues )
TensorEigenValues.calculateEigenValuesSquare(final RandomAccessibleInterval< T > tensor,    final RandomAccessibleInterval< U > eigenvalues )
TensorEigenValues.calculateEigenValues(final RandomAccessibleInterval< T > tensor, final RandomAccessibleInterval< U > eigenvalues, final EigenValues< T, U > ev )
LocalExtrema.findLocalExtrema(final RandomAccessibleInterval< T > source, final LocalNeighborhoodCheck< P, T > localNeighborhoodCheck )
LocalExterma.findLocalExtrema(final RandomAccessibleInterval< T > source, final LocalNeighborhoodCheck< P, T > localNeighborhoodCheck, final Shape shape )
LocalExtrema.findLocalExtrema(final RandomAccessible< T > source,final Interval interval,final LocalNeighborhoodCheck< P, T > localNeighborhoodCheck )
LocalExtrema.findLocalExtrema(final RandomAccessible< T > source,final Interval interval,final LocalNeighborhoodCheck< P, T > localNeighborhoodCheck,final Shape shape )
SubpixelLocalization.refinePeaks(final List< P > peaks, final RandomAccessible< T > img, final Interval validInterval, final boolean returnInvalidPeaks,final int maxNumMoves, final boolean allowMaximaTolerance, final float maximaTolerance, final boolean[] allowedToMoveInDim )
DistanceTransform.transform(...) (6 diffenrent versions)
DistanceTransform.binaryTransform(...) (5 different versions)

The work isn't finished yet: classes Erosion, Dilation, Opening, Closing, TopHat, BlackTopHat could have methods without the numThread parameter.

Overall this PR shows that imglib2-algorithm can be greatly simplified using the new parallelization framework. It reduces code duplication, because for many algorithms there is a single-threaded and multi-threaded version, which is no longer needed.

tpietzsch commented 4 years ago

I didn't look at it in detail yet, but I think the IterableLoopBuilder belongs in core and should be a separate PR

imagejan commented 3 years ago

What's the state of this PR? Can we have IterableLoopBuilder in imglib2 core?

maarzt commented 2 years ago

This PR requires https://github.com/imglib/imglib2/pull/310 (IterableLoopBuilder) to be merged an released first.

maarzt commented 2 years ago

The imglib2 Parallelization context + multithreaded LoopBuilder allow to conveniently implement multi-threading into an imglib2 algorithms. (As shown in this PR) There's no need to have ExecutorService or numTasks parameter. The method signatures remain clean. But this raises an important question? How to deal with existing single threaded methods, should we make the multi-threaded?

Possible answers:

Make these methods multi-threaded. They can still be executed single-threaded by using Parallelization.runSingleThreaded(...). Consequence: Exiting code will be multi-threaded, and by default run faster. But also some existing code might run slower because of the multi-threading overhead or if too much multi-threading happens in parallel. Additionally some code with thread-unsafe RandomAccesses might break.
Keep the existing methods single-threaded. Add multi-threaded alternatives. Consequence: Nothing breaks. Performance stay constant. But also no speed up for existing code. People might not use the multi-threaded alternatives, because they are not aware of them.

@tpietzsch @axtimwalde @StephanPreibisch What are your thoughts?

tpietzsch commented 2 years ago

Also discussed gitter: https://gitter.im/imglib/imglib2?at=618bd07238377967f4a42c73

copy&paste from there:

Curtis Rueden @ctrueden Nov 10 16:10 @maarzt Awesome, thanks for all your work on this! For what to do with existing algorithms: I think it depends on whether multithreading them would change the result. If you arrive at the same deterministic answer as before, then I vote to definitely switch them over to multithreading. If the result changes, but is still deterministic, you could add new method sig(s) and deprecate the old one(s). If the result of multithreading it becomes non-deterministic, that's a problem for Ops and probably others using the algorithm, so we should be careful there.

Jean-Yves Tinevez @tinevez Nov 10 16:51 Awesome !

Tobias Pietzsch @tpietzsch Nov 10 17:36 I would also change to multi-threading by default. (or rather using current Parallelization context by default). I would prehaps keep the single-threaded versions under an explicit name (e.g. PartialDerivative.gradientForwardDifferenceSingleThreaded) for things that are possibly run on tiny things where even querying the Parallelization ThreadLocal could be considered too much overhead. These "kernel methods" are perhaps anyway used as building blocks for the multi-threaded versions, and I don't see a big downside to exposing them.

Tobias Pietzsch @tpietzsch Nov 10 17:41 But multi-threaded by default is good @axtimwalde @StephanPreibisch opinions?

axtimwalde commented 2 years ago

Auto-multi-threading can be bad when parallelizing things with Spark. The Parallelization context will have unintended consequences when several tasks run in the same JVM (which is typical). Am I overthinking this?

tpietzsch commented 2 years ago

Auto-multi-threading can be bad when parallelizing things with Spark. The Parallelization context will have unintended consequences when several tasks run in the same JVM (which is typical). Am I overthinking this?

Maybe I'm "underthinking" it, but wouldn't it be sufficient to then just use Parallelization.runSingleThreaded(...)around each task?

maarzt commented 2 years ago

Hi @axtimwalde, I hope the parallelization context will be useful also for your code that runs on Spark. It is actually intended for those use cases. Where a part of the code should run single threaded. And no worries it's thread safe. It uses ThreadLocal to store one configuration per thread.

As Tobi suggested, all you need to do is to wrap the tasks that you execute one Spark into a call of Parallelization.runSingleThreaded(...). I hope it will for you. We otherwise need to fix the Parallelization context. I'm happy to look at any example where it breaks, or where you think it would break.

axtimwalde commented 2 years ago

May be I am not understanding this well enough. However, Parallelization.runSingleThreaded(...) calls TaskExecutors.singleThreaded(...) which returns a singleton executor for the entire JVM. That would mean that all parallel Spark tasks running on the same JVM now run sequentially instead of in parallel. This would not be right. It looks like Spark's and Parallelization's task management are trying to do the same thing in incompatible ways. But again, may be I am misinterpreting something here, haven't spent enough time, sorry.

maarzt commented 2 years ago

However, Parallelization.runSingleThreaded(...) calls TaskExecutors.singleThreaded(...) which returns a singleton executor for the entire JVM.

Yes that is correct. TaskExecutors.singleThreaded() returns a singleton executor. This executor is the only instance of the class SequentialTaskExecutor. But this won't cause any trouble. The class SequentialTaskExecutor has no fields and therefor no internal state or state changes. One could have multiple instances of SequentialTaskExecutor, it wouldn't make no difference. They would all behave exactly the same at any time... I wanted to make the overhead of calling TaskExecutors.singleThreaded() as small as possible. Returning an singleton is faster than creating new instances.

By the way, the same is true for TaskExecutors.multiThreaded() and the respective ForkJoinExecutorService. There is always only one instance, no fields, no internal state, no need to create multiple instances.

That would mean that all parallel Spark tasks running on the same JVM now run sequentially instead of in parallel.

No. The SequentialTaskExecutor isn't that powerful. Let us assume Spark is configured to start 8 threads in a JVM. Calling Parallelization.runSingleThreaded(...) won't change anything about this. All it does is preventing the imglib2 algorithms from spawning new threads. And I assume that's exactly what you need.

ctrueden commented 1 year ago

@maarzt @tpietzsch @axtimwalde Is this PR still relevant? It would be great to get this library using the unified multithreading approach!

axtimwalde commented 1 year ago

I think this is still relevant and looks good if running tasks SequentialTaskExecutor.getInstance() from multiple threads will not force-sequentialize them.

maarzt commented 1 year ago

I will go over this PR, check everything, and prepare it to be merged.

maarzt commented 1 year ago

Ok, I added tests and updated the javadoc. I would say, this PR is ready to be merged!

imglib / imglib2-algorithm

Use new imglib2 parallelization approach #83