Java out of memory issue

ieivanov commented 1 year ago

I am running into a Java out of memory issue when acquiring large sequenced z-stacks over multiple timepoints.

This effect can be simulated with the demo config and the follow code

from pycromanager import Core, Acquisition, multi_d_acquisition_events

mmc = Core()
mmc.set_config('Camera', 'HighRes')
mmc.set_property('Z', 'UseSequences', 'Yes')

events = multi_d_acquisition_events(num_time_points=3, time_interval_s=20, z_start=-3, z_end=3, z_step=0.01)

with Acquisition(r'D:\\temp', 'pm_acq_bsi', show_display=False) as acq:
    acq.acquire(events)

When monitoring the Java memory (Plugins -> Utilities -> Monitor Memory...) I see that the memory increases linearly during the acquisition until it reaches ~70% at which point the garbage collector kicks in. This pattern is different when running the same acquisition thru the MDA - here I see more frequent garbage collection.

When using the demo camera the this acquisition completes successfully, but when using a real camera (Prime BSI Express thru the PVCAM device adapter in my case) I get the following error at ~50% memory usage:

Exception: java.lang.OutOfMemoryError: The system ran out of memory!
mmcorej.MMCoreJJNI.CMMCore_popNextImageMD__SWIG_0(Native Method)
mmcorej.CMMCore.popNextImageMD(CMMCore.java:1064)
mmcorej.CMMCore.popNextTaggedImage(CMMCore.java:200)
mmcorej.CMMCore.popNextTaggedImage(CMMCore.java:205)
org.micromanager.acqj.internal.Engine.acquireImages(Engine.java:423)
org.micromanager.acqj.internal.Engine.executeAcquisitionEvent(Engine.java:325)
org.micromanager.acqj.internal.Engine.lambda$processAcquisitionEvent$5(Engine.java:212)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

I don't see this error when running the same acquisition thru the MM MDA.

Increasing the Java memory allocation delays the error, but does not fix it. I don't see imaged accumulating in the MM circular buffer, and increasing the Acquisition class save_queue_size does not help either.

Is it possible for the AcqEngJ to run more frequent garbage collection?

henrypinkard commented 1 year ago

There's a [System.gc()](https://docs.oracle.com/javase/7/docs/api/java/lang/System.html#gc()) call that could certainly be added into the acquisition engine. However, its possible this would lead to a performance hit so it definitely needs to be tested before adding. I also don't think it guarentees that the GC runs.

When using the demo camera the this acquisition completes successfully, but when using a real camera (Prime BSI Express thru the PVCAM device adapter in my case) I get the following error at ~50% memory usage:

Its very strange to me that this would happen at 50%, not 100%. You mean 50% on the ImageJ monitor memory, right? Or 50% on the system wide process memory (that includes circular buffer)

Also, if you click on the ImageJ monitor memory plugin, it should force (or maybe just suggest) a GC event. Depending on the timing of it all, if you're able to repeatedly click this and prevent the error, that suggests that adding in System.gc() is a good idea. This also would allow you to verify that there is not a memory leak at play here. If after clicking the baseline that it returns to doesn't stay approximately the same, then something is failing to clean up.

@nicost Do you other thoughts or insights about this?

ieivanov commented 1 year ago

Its very strange to me that this would happen at 50%, not 100%. You mean 50% on the ImageJ monitor memory, right? Or 50% on the system wide process memory (that includes circular buffer)

It does seem rather suspicious. I think this could well be a problem specific to the PVCAM device adapter, rather than a general issue, but it's also strange that an acquisition with this camera completes through the MDA, but errors our when using pycromanager. And yes, I see that the camera error out at 50% use of the ImageJ memory allocated under Edit -> Options -> Memory & Threads. The circular buffer is pretty much empty the whole time.

Also, if you click on the ImageJ monitor memory plugin, it should force (or maybe just suggest) a GC event. Depending on the timing of it all, if you're able to repeatedly click this and prevent the error, that suggests that adding in System.gc() is a good idea.

I have tried clicking on the ImageJ status bar to force / suggest garbage collection, but that did not seem to have an effect - I don't see the memory drop, and the pycromanager acquisition still errors out.

henrypinkard commented 1 year ago

I wrote a speed test and analysis notebook the other week that could provide some useful diagnostics

heres the script. I produces csv files that you can subsequently plug into this noteboook to visualize various bottlenecks in the acquisition pipeline.

The java code required is currently in a PR to mmstudio. You can grab the next continuous integration build after it merges to test

dpshepherd commented 1 year ago

Awhile ago we also saw similarly weird behavior between MM acquisitions and PM acquisitions with PVCAM (same camera model as yours). We spent a lot of time on saving data, diagnosing write speeds, different garbage collectors, changing timing of garbage collectors, etc... with @henrypinkard awhile ago.

In the end, we switched to a Hamamatsu camera (OrcaFusion BT with the fast capture card) and since then regularly collect 10-100 TBs of images over days with repeated acquisitions without issue.

I know that isn't super helpful, but may suggest it is a PVCAM issue and not a MM/PM issue.

henrypinkard commented 1 year ago

@ieivanov another thing you could try is going into the tools--options in MM and enabling AcqEngJ to test if its a difference in how AcqEngJ and the clojure engine call the core

Do you have any ideas @tomhanak? Is PVCAM maybe keeping its own internal buffer where extra memory use is hiding?

ieivanov commented 1 year ago

Thank you both. I'll try these out and I'll let you know if I find anything suspicious. It would be good to know if the PVCAM device adapter presents a bottleneck

tomhanak commented 1 year ago

Do you have any ideas @tomhanak? Is PVCAM maybe keeping its own internal buffer where extra memory use is hiding?

It depends how the Micro-Manager was used since started. PVCAM adapter distinguishes two types of acquisitions and keeps separate buffers for them.

For single snaps that is used when clicking Snap button or using MDA (with any delay between frames, using multiple channels, etc.), is allocated a buffer for one frame, i.e. 8MB for BSI-X, full sensor size and no binning.
For sequence acquisition (in PVCAM terms a continuous, live or circular buffer acquisition) it allocates a buffer with number of frames automatically calculated by default. The recommended number of frames is roughly the same as FPS, i.e. to hold frames covering one second. For BSI-X, full sensor size and no binning the buffer holds 94 frames so almost 800MB.

Since you didn't mention PC configuration, my guess would be it doesn't have enough free RAM?

When I start MM, click Live button, click Stop button and click ImageJ status bar, the Task Manager reports the process uses ~2GB of RAM. It is ~800MB for PVCAM buffer, 1GB sequence buffer configured for MMCore via Micro-Manager Options dialog and the rest is java environment and various small variables all around. When I keep running the live acquisition, the RAM utilization oscilates between 2GB and 4GB, depending on garbage collector (ImageJ.cfg gives 2GB for java).

Your experiment uses MDA with time interval between frames so PVCAM does single snaps. You can try:

Override the PVCAM sequence buffer size by changing Camera-1-CircularBufferAutoSize to OFF and changing number of frames in it via Camera-1-CircularBufferFrameCount property from 94 to e.g. 8 frames only. This shrinks the buffer to 64MB only.
Reduce MMCore sequence buffer size to 100-200MB only. It should work well as you wrote "the circular buffer is pretty much empty the whole time".

For completeness, there might be additional buffers allocated by PVCAM in case it does local (host-side) post-processing, but I don't expect it is your case. It can be the case e.g. with:

Enabled frame rotate/flip feature via environment variables.
Decompression of bit-packed frames. Prime BSI Express does not send compressed frames, but e.g. Prime BSI does on some ports.
Enabled frame summing via recently added PVCAM parameters.

ieivanov commented 1 year ago

@tomhanak thank you for your insight, it was very helpful. I think I have been able to narrow down the source of the problem.

I am using the latest version of MicroManager (nightly build 2023_08_08) with 5 GB circular buffer and 2 GB of Java (JVM) memory. My PC has 128 GB of memory total, so I have some room to make the buffers bigger. I am also acquiring data using the latest version of pycromanager - 0.26.0. I am running sequenced acquisitions in MM lingo (or continuous in PVCAM terms).

I have found that the problem is specific to acquiring data in 200MHz 11bit readout mode. This acquisition completes without problems:

from pycromanager import Core, Acquisition, multi_d_acquisition_events

mmc = Core()
mmc.set_property('Prime BSI Express', 'ReadoutRate', '100MHz 16bit')

with Acquisition(r'D:\temp', 'pm_cam_mda_test', show_display=False) as acq:
    acq.acquire(
        multi_d_acquisition_events(
            num_time_points=1000,
            time_interval_s=0,
        )
    )

However, if I change the readout mode to 200MHz 11bit I see that the JVM memory grows linearly with time until I get a java.lang.OutOfMemoryError, i.e. garbage collection is not called during the acquisition. Snapping an image before I start the acquisition does not seem to change this behavior - I still get a memory error. Increasing the JVM size such that the sequenced acquisition fits in RAM allows the acquisition to complete successfully. Interestingly, if I then click the ImageJ Status Bar to call garbage collection I do not see the java memory drop significantly. I am guessing that there may be a memory leak associated with the 200MHz 11bit acquisition mode.

Even more interesting, this problem seems to be specific to pycromanager and is not reproduced for an equivalent acquisition using the Micro-manager MDA. Whereas the JVM grows linearly with time for a pycromanager acquisition, MM forces garbage collection more frequently and I see a sawtooth patters for the memory usage. Clicking the IJ status bar at the end bring the java memory close to zero.

@henrypinkard is it possible that the memory leak is actually in pycromanager, related to images that need bit conversion?

ieivanov commented 1 year ago

I think the problem is with NDTiffStorage. If I run this acquisition

from pycromanager import Core, Acquisition, multi_d_acquisition_events

mmc = Core()
mmc.set_property('Prime BSI Express', 'ReadoutRate', '200MHz 11bit')

def image_process_fn(image, metadata):
    return None

with Acquisition(image_process_fn=image_process_fn, show_display=False) as acq:
    acq.acquire(
        multi_d_acquisition_events(
            num_time_points=500,
            time_interval_s=0,
        )
    )

the acquisition completes successfully, I see a sawtooth pattern in the memory usage, and clicking the IJ status bar at the end of the acquisition brings the java memory close to zero. The MM core logs say that the camera still runs a sequenced acquisition.

ieivanov commented 1 year ago

P.S. In 200MHz 11bit mode the camera return images with BitDepth = 11 and PixelType = GRAY16

henrypinkard commented 1 year ago

What about if you run an MDA through MM with NDTiff selected?

The only thing I can think of in NDTiff is the buffer creation code. Is it possible that the mode that's triggering it is making metadata so long its repeatedly creating large byte buffers that hang in memory? You could probably see what's going on here with a few prints

Alternatively, you could run the memory profiler from your IDE and see what objects are taking up a bunch a of memory.

ieivanov commented 1 year ago

I added a 11-bit mode to the demo camera: micro-manager/mmCoreAndDevices#326 and can now reproduce this error with the demo config and 2000 MB JVM with this code:

from pycromanager import Core, multi_d_acquisition_events, Acquisition

mmc = Core()
mmc.set_property('Camera', 'BitDepth', '11')
mmc.set_property('Camera', 'OnCameraCCDXSize', '2048')
mmc.set_property('Camera', 'OnCameraCCDYSize', '2048')

events = multi_d_acquisition_events(
    num_time_points=500,
    time_interval_s=0,
)

with Acquisition(r'Q:\Ivan\debug', 'acq', show_display=False) as acq:
    acq.acquire(events)

I haven't been able to reproduce the error from MicroManager yet. I tried using the native acq engine and saving as NDTiff as well as using AcqEngJ and saving either as multipage tiff or NDTiff. In all cases, the acquisition completed successfully and the JVM cleared by clicking the IJ status bar at the end of the acquisition. Not sure what component that leaves.

For testing purposes, the num_time_points can be reduced to 200. The acquisition completes with ~80% JVM use which is not cleared by clicking the status bar.

@henrypinkard what component do you think that leaves?

ieivanov commented 1 year ago

@tomhanak thanks for your help with this. I think now we've identified this as a pycromanager problem.

henrypinkard commented 1 year ago

I'm not sure. Thanks for the script--I'll try and see if I can reproduce

ieivanov commented 1 year ago

Some more insight - with the above script, pycromanager altogether fails to save the 11-bit data. I think this is now more of a bug rather than a memory issue. It looks like NDTiff currently does not support 11-bit pixel type: https://github.com/micro-manager/NDTiffStorage/blob/83cc3e58fa3fa24f2710f78c6f867b284b266dba/java/src/main/java/org/micromanager/ndtiffstorage/NDTiffWriter.java#L389-L405

micro-manager / pycro-manager

Java out of memory issue #536