imagej / imagej-ops

ImageJ Ops: "Write once, run anywhere" image processing
https://imagej.net/libs/imagej-ops
BSD 2-Clause "Simplified" License
89 stars 42 forks source link

Fractal Dimension creates thousands of zombie threads that crash ImageJ #637

Open mdoube opened 3 years ago

mdoube commented 3 years ago

BoneJ's Fractal Dimension plugin uses an Op to do the box-counting maths. On larger images with many boxes to count, it adopts a multithreading approach, using a standard call to get the number of available processors:

https://github.com/imagej/imagej-ops/blob/9dad3f91ebd45cbeb0a46757d0918d43d379204f/src/main/java/net/imagej/ops/topology/BoxCount.java#L196

Fractal Dimension crashes ImageJ after a few iterations of a batch job.

Java HotSpot(TM) 64-Bit Server VM warning: Attempt to protect stack guard pages failed.
<repeated many times>
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f318501e000, 12288, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 12288 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/mdoube/Fiji.app.dev/hs_err_pid379776.log
Java HotSpot(TM) 64-Bit Server VM warning: Attempt to protect stack guard pages failed.
Java HotSpot(TM) 64-Bit Server VM warning: Attempt to protect stack guard pages failed.
Java HotSpot(TM) 64-Bit Server VM warning: Attempt to protect stack guard pages failed.
Java HotSpot(TM) 64-Bit Server VM warning: Attempt to deallocate stack guard pages failed.
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f7f3bbcb000, 65536, 1) failed; error='Cannot allocate memory' (errno=12)
[thread 139913374230272 also had an error]

Only sometimes a stack trace is printed to the console, like this one:

[ERROR] Module threw error
java.lang.OutOfMemoryError: unable to create new native thread
    at java.lang.Thread.start0(Native Method)
    at java.lang.Thread.start(Thread.java:717)
    at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
    at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367)
    at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
    at net.imagej.ops.topology.BoxCount.countForegroundBoxes(BoxCount.java:229)
    at net.imagej.ops.topology.BoxCount.lambda$countTranslatedGrids$0(BoxCount.java:192)
    at java.util.stream.ReferencePipeline$5$1.accept(ReferencePipeline.java:227)
    at java.util.stream.SpinedBuffer$1Splitr.forEachRemaining(SpinedBuffer.java:364)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.LongPipeline.reduce(LongPipeline.java:443)
    at java.util.stream.LongPipeline.min(LongPipeline.java:401)
    at net.imagej.ops.topology.BoxCount.calculate(BoxCount.java:139)
    at net.imagej.ops.topology.BoxCount.calculate(BoxCount.java:74)
    at org.bonej.wrapperPlugins.FractalDimensionWrapper.lambda$run$0(FractalDimensionWrapper.java:183)
    at java.util.ArrayList.forEach(ArrayList.java:1257)
    at org.bonej.wrapperPlugins.FractalDimensionWrapper.run(FractalDimensionWrapper.java:174)
    at org.scijava.command.CommandModule.run(CommandModule.java:196)
    at org.scijava.module.ModuleRunner.run(ModuleRunner.java:165)
    at org.scijava.module.ModuleRunner.call(ModuleRunner.java:124)
    at org.scijava.module.ModuleRunner.call(ModuleRunner.java:63)
    at org.scijava.thread.DefaultThreadService.lambda$wrap$2(DefaultThreadService.java:225)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Profiling active threads looks like this:

Screenshot from 2021-06-08 10-39-16

3000 - 5000 threads are created on each iteration but are not completed or removed or terminated.

Setting processors = 1 to make it a single-threaded algorithm fixes the bug, but also means that large images are slow to analyse.

A safer way to multithread is needed for box counting.

See also: https://forum.image.sc/t/memory-issues-with-bonej/53589

imagesc-bot commented 3 years ago

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/memory-issues-with-bonej/53589/4

mdoube commented 3 years ago

Anisotropy uses a similar ExecutorService approach to multithreading and includes a shutdownAndAwaitTermination() method that appears to clean up old threads.

https://github.com/bonej-org/BoneJ2/blob/da5aa63cdc15516605e8dcb77458eb34b0f00b85/Modern/wrapperPlugins/src/main/java/org/bonej/wrapperPlugins/AnisotropyWrapper.java#L380

Fractal Dimension may be using a messy approach to thread creation and is not tidying up after itself.

mdoube commented 3 years ago

Fixed by calling shutdown() on the ExecutorService in net.imagej.ops.morphology.outline.Outline and net.imagej.ops.topology.BoxCount

Screenshot from 2021-08-13 17-38-22

mdoube commented 3 years ago

PR #624 should be applied as well becuase it has a better threading model for Outline