Closed ieivanov closed 1 year ago
There's a [System.gc()
](https://docs.oracle.com/javase/7/docs/api/java/lang/System.html#gc()) call that could certainly be added into the acquisition engine. However, its possible this would lead to a performance hit so it definitely needs to be tested before adding. I also don't think it guarentees that the GC runs.
When using the demo camera the this acquisition completes successfully, but when using a real camera (Prime BSI Express thru the PVCAM device adapter in my case) I get the following error at ~50% memory usage:
Its very strange to me that this would happen at 50%, not 100%. You mean 50% on the ImageJ monitor memory, right? Or 50% on the system wide process memory (that includes circular buffer)
Also, if you click on the ImageJ monitor memory plugin, it should force (or maybe just suggest) a GC event. Depending on the timing of it all, if you're able to repeatedly click this and prevent the error, that suggests that adding in System.gc()
is a good idea. This also would allow you to verify that there is not a memory leak at play here. If after clicking the baseline that it returns to doesn't stay approximately the same, then something is failing to clean up.
@nicost Do you other thoughts or insights about this?
Its very strange to me that this would happen at 50%, not 100%. You mean 50% on the ImageJ monitor memory, right? Or 50% on the system wide process memory (that includes circular buffer)
It does seem rather suspicious. I think this could well be a problem specific to the PVCAM device adapter, rather than a general issue, but it's also strange that an acquisition with this camera completes through the MDA, but errors our when using pycromanager. And yes, I see that the camera error out at 50% use of the ImageJ memory allocated under Edit -> Options -> Memory & Threads. The circular buffer is pretty much empty the whole time.
Also, if you click on the ImageJ monitor memory plugin, it should force (or maybe just suggest) a GC event. Depending on the timing of it all, if you're able to repeatedly click this and prevent the error, that suggests that adding in System.gc() is a good idea.
I have tried clicking on the ImageJ status bar to force / suggest garbage collection, but that did not seem to have an effect - I don't see the memory drop, and the pycromanager acquisition still errors out.
I wrote a speed test and analysis notebook the other week that could provide some useful diagnostics
heres the script. I produces csv files that you can subsequently plug into this noteboook to visualize various bottlenecks in the acquisition pipeline.
The java code required is currently in a PR to mmstudio. You can grab the next continuous integration build after it merges to test
Awhile ago we also saw similarly weird behavior between MM acquisitions and PM acquisitions with PVCAM (same camera model as yours). We spent a lot of time on saving data, diagnosing write speeds, different garbage collectors, changing timing of garbage collectors, etc... with @henrypinkard awhile ago.
In the end, we switched to a Hamamatsu camera (OrcaFusion BT with the fast capture card) and since then regularly collect 10-100 TBs of images over days with repeated acquisitions without issue.
I know that isn't super helpful, but may suggest it is a PVCAM issue and not a MM/PM issue.
@ieivanov another thing you could try is going into the tools--options in MM and enabling AcqEngJ to test if its a difference in how AcqEngJ and the clojure engine call the core
Do you have any ideas @tomhanak? Is PVCAM maybe keeping its own internal buffer where extra memory use is hiding?
Thank you both. I'll try these out and I'll let you know if I find anything suspicious. It would be good to know if the PVCAM device adapter presents a bottleneck
Do you have any ideas @tomhanak? Is PVCAM maybe keeping its own internal buffer where extra memory use is hiding?
It depends how the Micro-Manager was used since started. PVCAM adapter distinguishes two types of acquisitions and keeps separate buffers for them.
Since you didn't mention PC configuration, my guess would be it doesn't have enough free RAM?
When I start MM, click Live button, click Stop button and click ImageJ status bar, the Task Manager reports the process uses ~2GB of RAM. It is ~800MB for PVCAM buffer, 1GB sequence buffer configured for MMCore via Micro-Manager Options dialog and the rest is java environment and various small variables all around. When I keep running the live acquisition, the RAM utilization oscilates between 2GB and 4GB, depending on garbage collector (ImageJ.cfg gives 2GB for java).
Your experiment uses MDA with time interval between frames so PVCAM does single snaps. You can try:
Camera-1-CircularBufferAutoSize
to OFF
and changing number of frames in it via Camera-1-CircularBufferFrameCount
property from 94 to e.g. 8 frames only. This shrinks the buffer to 64MB only.For completeness, there might be additional buffers allocated by PVCAM in case it does local (host-side) post-processing, but I don't expect it is your case. It can be the case e.g. with:
@tomhanak thank you for your insight, it was very helpful. I think I have been able to narrow down the source of the problem.
I am using the latest version of MicroManager (nightly build 2023_08_08) with 5 GB circular buffer and 2 GB of Java (JVM) memory. My PC has 128 GB of memory total, so I have some room to make the buffers bigger. I am also acquiring data using the latest version of pycromanager - 0.26.0. I am running sequenced acquisitions in MM lingo (or continuous in PVCAM terms).
I have found that the problem is specific to acquiring data in 200MHz 11bit
readout mode. This acquisition completes without problems:
from pycromanager import Core, Acquisition, multi_d_acquisition_events
mmc = Core()
mmc.set_property('Prime BSI Express', 'ReadoutRate', '100MHz 16bit')
with Acquisition(r'D:\temp', 'pm_cam_mda_test', show_display=False) as acq:
acq.acquire(
multi_d_acquisition_events(
num_time_points=1000,
time_interval_s=0,
)
)
However, if I change the readout mode to 200MHz 11bit
I see that the JVM memory grows linearly with time until I get a java.lang.OutOfMemoryError
, i.e. garbage collection is not called during the acquisition. Snapping an image before I start the acquisition does not seem to change this behavior - I still get a memory error. Increasing the JVM size such that the sequenced acquisition fits in RAM allows the acquisition to complete successfully. Interestingly, if I then click the ImageJ Status Bar to call garbage collection I do not see the java memory drop significantly. I am guessing that there may be a memory leak associated with the 200MHz 11bit
acquisition mode.
Even more interesting, this problem seems to be specific to pycromanager and is not reproduced for an equivalent acquisition using the Micro-manager MDA. Whereas the JVM grows linearly with time for a pycromanager acquisition, MM forces garbage collection more frequently and I see a sawtooth patters for the memory usage. Clicking the IJ status bar at the end bring the java memory close to zero.
@henrypinkard is it possible that the memory leak is actually in pycromanager, related to images that need bit conversion?
I think the problem is with NDTiffStorage. If I run this acquisition
from pycromanager import Core, Acquisition, multi_d_acquisition_events
mmc = Core()
mmc.set_property('Prime BSI Express', 'ReadoutRate', '200MHz 11bit')
def image_process_fn(image, metadata):
return None
with Acquisition(image_process_fn=image_process_fn, show_display=False) as acq:
acq.acquire(
multi_d_acquisition_events(
num_time_points=500,
time_interval_s=0,
)
)
the acquisition completes successfully, I see a sawtooth pattern in the memory usage, and clicking the IJ status bar at the end of the acquisition brings the java memory close to zero. The MM core logs say that the camera still runs a sequenced acquisition.
P.S. In 200MHz 11bit
mode the camera return images with BitDepth = 11 and PixelType = GRAY16
What about if you run an MDA through MM with NDTiff selected?
The only thing I can think of in NDTiff is the buffer creation code. Is it possible that the mode that's triggering it is making metadata so long its repeatedly creating large byte buffers that hang in memory? You could probably see what's going on here with a few prints
Alternatively, you could run the memory profiler from your IDE and see what objects are taking up a bunch a of memory.
I added a 11-bit mode to the demo camera: micro-manager/mmCoreAndDevices#326 and can now reproduce this error with the demo config and 2000 MB JVM with this code:
from pycromanager import Core, multi_d_acquisition_events, Acquisition
mmc = Core()
mmc.set_property('Camera', 'BitDepth', '11')
mmc.set_property('Camera', 'OnCameraCCDXSize', '2048')
mmc.set_property('Camera', 'OnCameraCCDYSize', '2048')
events = multi_d_acquisition_events(
num_time_points=500,
time_interval_s=0,
)
with Acquisition(r'Q:\Ivan\debug', 'acq', show_display=False) as acq:
acq.acquire(events)
I haven't been able to reproduce the error from MicroManager yet. I tried using the native acq engine and saving as NDTiff as well as using AcqEngJ and saving either as multipage tiff or NDTiff. In all cases, the acquisition completed successfully and the JVM cleared by clicking the IJ status bar at the end of the acquisition. Not sure what component that leaves.
For testing purposes, the num_time_points can be reduced to 200. The acquisition completes with ~80% JVM use which is not cleared by clicking the status bar.
@henrypinkard what component do you think that leaves?
@tomhanak thanks for your help with this. I think now we've identified this as a pycromanager problem.
I'm not sure. Thanks for the script--I'll try and see if I can reproduce
Some more insight - with the above script, pycromanager
altogether fails to save the 11-bit data. I think this is now more of a bug rather than a memory issue. It looks like NDTiff currently does not support 11-bit pixel type: https://github.com/micro-manager/NDTiffStorage/blob/83cc3e58fa3fa24f2710f78c6f867b284b266dba/java/src/main/java/org/micromanager/ndtiffstorage/NDTiffWriter.java#L389-L405
I am running into a Java out of memory issue when acquiring large sequenced z-stacks over multiple timepoints.
This effect can be simulated with the demo config and the follow code
When monitoring the Java memory (Plugins -> Utilities -> Monitor Memory...) I see that the memory increases linearly during the acquisition until it reaches ~70% at which point the garbage collector kicks in. This pattern is different when running the same acquisition thru the MDA - here I see more frequent garbage collection.
When using the demo camera the this acquisition completes successfully, but when using a real camera (Prime BSI Express thru the PVCAM device adapter in my case) I get the following error at ~50% memory usage:
I don't see this error when running the same acquisition thru the MM MDA.
Increasing the Java memory allocation delays the error, but does not fix it. I don't see imaged accumulating in the MM circular buffer, and increasing the
Acquisition
classsave_queue_size
does not help either.Is it possible for the AcqEngJ to run more frequent garbage collection?