Snap image is much slower and laggy compared to pymmcore

oeway commented 4 years ago

I am comparing the performance of pycormanager with pymmcore by looping snap and get images, it appears that pycromanager is much more laggy and slower. See the following GIFs:

Using pymmcore, it's very fast and the images shows instantly (source code notebook):
Using pycromanager, it's much more laggy and the acquisition was blocked for a while the come again(source code notebook):

The tests were done on the same computer (MacOS 10.14.6), same Jupyter notebook, same micro-manager(2.0.0 gamma1-20200824) and config file, and I can reliably reproduce the result.

Any idea why?

henrypinkard commented 4 years ago

The pycro-manager calls have a good bit more overhead than calls directly to pymmcore. This is because they need to first travel through the python-Java ZeroMQ bridge before going to the core. There's definitely some room for improvement on the speed of this bridge (see https://github.com/micro-manager/pycro-manager/issues/70), but there might be ways around that even without that.

My first thought is that this might be related to Java memory/garbage collection. From ImageJ you can select "Edit--options--memory and threads" to change the amount of memory allocated to Java. You could also select "Plugins-utilities--monitor memory" to see a real time graph of memory. If the hiccups in the display correspond to drops in memory (i.e. garbage collection) then that would support this theory.

Alternatively, there might be better core functions to call than snapImage. Following the implementation of live mode in micro-manager . Could yield better results, though this would require more function calls than the snap and image and get it paradigm you have right now

oeway commented 4 years ago

It appears not like the overhead of ZMQ etc., I would expect the overhead to be like a constant delay, but here what we see is really like serious traffic jam, the snap call froze for quite a while before it starts.

I just change the memory setting to 10GB, also the monitor shows it's using 3~4% memory, when snapping image, it can goes up and down, but never go up to 10% during my test.

The notebook I made is meant to be simple so I tried to only call snapImage as a start. In the last step I do introduced startContinuousSequenceAcquisition and use that for live mode.

Could you try to run the notebook and see if you can reproduce?

henrypinkard commented 4 years ago

Yes I can reproduce. If I run for 10k snaps, it runs fast for about a 30 seconds before it starts to slow down.

You're right, it doesn't appear related to memory, as the max usage doesn't go up when it slows, but the rate of buildup slows suggesting fewer objects are created.

I'm not sure this is related to pycromanager though, because when I run the following I dont see any slowdown:

bridge = Bridge()
core = bridge.get_core()

while True:
    start = time.time()
    core.snap_image()
    mid = time.time()
    image_array = core.get_image().reshape(512, 512)
    end = time.time()
    print('{}\t\t{}'.format(end - mid, mid - start))

oeway commented 4 years ago

Maybe it's the same issue we have discussed before: https://github.com/micro-manager/pycro-manager/issues/79

But the wired thing is that I can produce this issue with only python script.

henrypinkard commented 4 years ago

Hmm that is weird. Is the python script different from the code above? Could you post so I can take a look?

oeway commented 4 years ago

It is wired, you can find the python script here: https://github.com/micro-manager/pycro-manager/issues/79 , basically the same as yours.

My other observation is that when the pycromanager access becoming laggy, it doesn't seem to affect the micromanger interface, I can still click snap and it respond quickly.

henrypinkard commented 4 years ago

For the one in #79, it must be an issue with micro-manager since it can be reproduced with a beanshell script (i.e. no Python involved). This one is different because we can make it go away with Python alone.

Is there any way to print from the console from the exported class? That way we could figure out which line in the MyMicroscope class is causing the slowdown

oeway commented 4 years ago

Yes, you can do await api.log("message string"), and you will see the log print in the console of Chrome Developer Tools (right click in the notebook, and choose Inspect Element, click Console).

(Note: add async before def if you use await in that function)

henrypinkard commented 4 years ago

Added some prints to test this out, and the slowdown is coming from the line await viewer.imshow(img). So i guess it has something to do with ITK-VTK viewer. This is consistent with the previous test I did snapping images as fast as possible. Not sure why there is different behavior with pymmcore though

oeway commented 4 years ago

Well, I just tried again with matplotlib instead of itk-vtk-viewer:

It stuck at frame 3 and 19 for quite a while then start again: matplotlib-pycromanager

Here is the code

def snap_image():
    # acquire an image and display it
    core.snap_image()
    tagged_image = core.get_tagged_image()
    # get the pixels in numpy array and reshape it according to its height and width
    image_array = np.reshape(tagged_image.pix, newshape=[-1, tagged_image.tags['Height'], tagged_image.tags['Width']])
    # for display, we can scale the image into the range of 0~255
    image_array = (image_array/image_array.max()*255).astype('uint8')
    # return the first channel if multiple exists
    return image_array[0, :, :]

from IPython.display import clear_output
for i in range(100):
    clear_output(wait=True)
    plt.figure()
    plt.title(i)
    plt.imshow(snap_image())
    plt.show()

Would this related to ZMQ since Jupyter notebooks also use ZMQ?

bryantChhun commented 4 years ago

Would this related to ZMQ since Jupyter notebooks also use ZMQ?

Is it possible your Jupyter notebook is on the same port as pycromanager? When you say "produce using only python script" I assume you're running outside of jupyter?

henrypinkard commented 4 years ago

Pycro-manager by default runs on port 4827. It's not clear to me what jupyter uses

I was able to reproduce the hang on image 3 in a notebook, but it depended on having created multiple instances of both Bridge and core (and also that the second core came from the second Bridge). There appears to be some weird thing related to garbage collection that causes hangs. The way this is supposed to work, garbage collection of a python object that shadow java objects triggers a ZMQ message from python to java that signals that references to the external have been released, so Java side can do the same. I was able to make a minimal example of this problem (in pycharm) with:

from pycromanager import Bridge
import gc

b = Bridge()
core = b.get_core()
gc.collect()

b = Bridge()
core = b.get_core()
gc.collect()

print('Never prints')

So maybe if you make sure that you're never creating more than one Bridge instance in your kernel, this will go away?

oeway commented 4 years ago

Aha, that sounds reasonable explanation, I haven’t test though.

Could you implement in pycromanager, use a static variable to make sure only one bridge will be created?

henrypinkard commented 4 years ago

Yes, will take a look at implementing that (https://github.com/micro-manager/pycro-manager/issues/102). In the mean time, I think you should be able to make a global Bridge and see if this is indeed the problem

oeway commented 4 years ago

I did some rounds of experiments, here is a list of actions and observations:

(actions are taken one after another)

Start MicroManger, start kernel, fluent (for 100 frames)
Restart kernel, laggy since frame 30
Kill the jupyter server, laggy
restart kernel, laggy
Restart MicroManager and restart kernel, fluent (for 100 frames)
Restart kernel, laggy since frame 76, 77 ...
Restart kernlel laggy since frame 0
Restart MicroManager and restart kernel, fluent (for 100 frames)

Each time I ran the following code in a jupyter notebook (same as I did previously):

for i in range(100):
    clear_output(wait=True)
    plt.figure()
    plt.title(i)
    plt.imshow(snap_image())
    plt.show()

When I restart the kernel, a new process will be started. Therefore I think only making sure a single bridge per process won't work, it seems there is also something in the Java side.

henrypinkard commented 4 years ago

I wasn't able to reproduce, but I still suspect this is related to python garbage collection as the destructors of the python objects shadowing java ones (i.e. tagged images) aren't firing at the moment. I need to look into this (https://github.com/micro-manager/pycro-manager/issues/104). In the mean time, if you add tagged_image.__del__() to your snap_image just before returning to manually garbage collect them, does it solve the problem?

oeway commented 4 years ago

I wasn't able to reproduce, but I still suspect this is related to python garbage collection as the destructors of the python objects shadowing java ones (i.e. tagged images) aren't firing at the moment. I need to look into this (#104). In the mean time, if you add tagged_image.__del__() to your snap_image just before returning to manually garbage collect them, does it solve the problem?

Hi, I just tired adding del to my snap_image, I restarted everything, the first time it works, then I restart the notebook kernel, it becomes slaggy again.

Another observation is, when the acqusition was stuck and I click interrupt the kernel, the code always stops at:

~/miniconda3/lib/python3.7/site-packages/pycromanager/core.py in _translate_call(self, method_specs, fn_args)
    348 
    349         self._socket.send(message)
--> 350         return self._deserialize(self._socket.receive())
    351 
    352     def _deserialize(self, json_return):

~/miniconda3/lib/python3.7/site-packages/pycromanager/core.py in receive(self, timeout)
     81     def receive(self, timeout=0):
     82         if timeout == 0:
---> 83             reply = self._socket.recv()
     84         else:
     85             start = time.time()

I would guess that means it's because the socket communication that blocking the execution rather than garbage collection. Maybe we should look into the ZMQ communication.

henrypinkard commented 4 years ago

I think the freeze after restarting is probably because the first kernel didn't get to complete both send and receive, so got stuck in a state where it couldn't receive messages. I can look into this (https://github.com/micro-manager/pycro-manager/issues/105). However, I don't think this is related to the slowdown during normal operation. Also you have to make sure to call tagged_image.__del__(), not del tagged_image.

For the kernel restarting, first make sure you're running pycromanager 0.4.6 as I added some cleanup behavior on exit (https://github.com/micro-manager/pycro-manager/pull/106). Also, there can be different behavior if you run shutdown kernel first and then restart kernel (it has a chance to clean up) versus just restart (it doesn't get a chance)

henrypinkard commented 4 years ago

When it got stuck and you clicked interrupt kernel, did it get stuck on the first image after restarting? Or did it get stuck in the middle of a sequence of images?

henrypinkard commented 4 years ago

Whoops accidental close

oeway commented 4 years ago

On Fri 11 Sep 2020 at 18:03, Henry Pinkard notifications@github.com wrote:

When it got stuck and you clicked interrupt kernel, did it get stuck on the first image after restarting? Or did it get stuck in the middle of a sequence of images?

It’s typically in the middle, but sometime can be in the second image. The first one seems fine.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/micro-manager/pycro-manager/issues/90#issuecomment-691181233, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADU3S57VIR6U7HSFRKYF7DSFJC6DANCNFSM4Q42PJUA .

henrypinkard commented 4 years ago

I'm not able to reproduce this. I'm going to try fixing all these various little things and pushing an update, as well as putting in some more debug logging, then maybe we can try again

henrypinkard commented 4 years ago

New changes should be available in next nightly build and new pip version after https://github.com/micro-manager/micro-manager/pull/942 merges.

I also added in a feature where if you call Bridge(debug=True), it will cause every incoming and outgoing command on the Java side to be printed to the micro-manager core log. So if there's something going wrong on that end it should be easier to track down.

Want to give it a try and see if you still have the issue?

oeway commented 4 years ago

Hi I upgraded micromanager(20200911) and pycromanager(0.4.9), also I call tagged_image.__del__() after I get the image. However, it seems the issue is still there.

If I keep the micro-manager running all the time, and keep restarting the kernel, or kill the entire notebook server, I get this laggy behavior starting from the second round of restarting.
Another observation is the more I restart, the more frequent I get this laggy.
Sometimes I start a complete new experiment, and it was laggy already in the first round.
Doing shutdown then restart seems not helping much here.

It's bad that you cannot reproduce this, I am thinking maybe we should build a docker image for this so we easily share a reproducible environment. For pymmcore, I made a docker image and we can run it on binder: We can basically do the same for pycromanager, but remove the --without-java flag. Is there a way to start micromanager from command line? Including skipping the configuration dialog and activate the ZMQ port?

henrypinkard commented 4 years ago

Okay, I actually was able to reproduce. Using the following code:

import time
times = []

for i in range(500):
    s = time.time()
    clear_output(wait=True)
    fig, ax = plt.subplots(1,2, figsize=(16,4))

    ax[0].imshow(snap_image())
#     ax[0].imshow((np.random.rand(512,512) * 5e5).astype(np.uint16))
    ax[0].set_title(i)

    times.append(time.time() - s)
    ax[1].plot(times)

    plt.show()

plt.plot(times)

However if I replace snapping image with random, I get the exact same behavior:

If I take out the plotting aspect of it, running:

times = []

for i in range(500):
    s = time.time()
    snap_image()
    times.append(time.time() - s)

plt.plot(times)

The delay goes away

So it seems that this is very much related to the displaying of images on my end. Do you see something different than this? It may be that pycro-manager also slows down when jupyter slows down since they both rely on ZMQ, but given that the problem exists on the jupyter side without pycro-manager but not vice versa, I think it is something that would need to be fixed in jupyter.

That's really cool that you were able to make a docker file with the core, but I'm not sure if this will be the answer here. I would imagine (correct me if im wrong) that most device adapters, which are developed against DLLs and the windows SDK, could not be compiled with docker (the demo camera being an exception). So while reproducible, it wouldn't be reflective of the conditions under which micro-manager is actually being used.

oeway commented 4 years ago

Wow, you did quite a bit of investigation work here. However, I think this regular delay in your plot isn't what I see here, what I am encountering is something much longer (sometimes >10s) and very irregular, it stuck for some time then works, then stuck again. I basically see this gif to get an feeling, see how the number on top of the figure change : matplotlib-pycromanager

For the docker image, what I meant is to have a reproducible testing environment only with the demo devices. Of course it won't be useful and feasible to accessing the actual device. But as a demo and debugging tool, it can be very helpful. For example, I will be able to send you a notebook on Binder and you can see the exact same laggy behaviour as I saw. Now I feel like you cannot really reproduce the issue I have.

henrypinkard commented 4 years ago

Ah okay I see. That lagginess is very weird. One last thing to look at: if you run in debug mode (Bridge(debug=True)), you can see every command on the java side printed to micro-manager core log (in the corelogs folder in mm install directory). Do you see anything suspicious in there when the delays are occuring?

henrypinkard commented 4 years ago

It's bad that you cannot reproduce this, I am thinking maybe we should build a docker image for this so we easily share a reproducible environment. For pymmcore, I made a docker image and we can run it on binder: We can basically do the same for pycromanager, but remove the --without-java flag. Is there a way to start micromanager from command line? Including skipping the configuration dialog and activate the ZMQ port?

I'm actually not sure. @nicost @marktsuchida, is there a command line/headless way to launch micro-manager Java layer?

nicost commented 4 years ago

Starting from the command line is certainly possible, but I have the feeling that you want to run it fully headless, which is not possible. You can start MM using: java -cp ij.jar ij.ImageJ and there is an option that you can set to skip the configuration dialog but that will still leave you with a UI.

bryantChhun commented 4 years ago

for what it's worth, I am able to reproduce using minimal code above:

Differences are:

I used core.snap_image() instead of your snap_image() function above.
Did not implement any garbage collection.
Used a MM nightly build from 2 days ago and fresh install of the repo's state as of yesterday (and not what is on pip).

This 1-5 second spike persists even if I am not using jupyter (run through pycharm as script)

image (6)

bryantChhun commented 4 years ago

Ok so I tried doing the same code using ZMQ in my mm2python implementation and can not replicate. Differences are:

I use push-pull pattern (I think pycromanager use req-rep)
Serialization might be different. I cast the pixels as byte array (or short buffer first, then byte array). Does pycromanager use json even for image data?
my socket settings might be different (and specific to push-pull). -- java-side sender has "setHWM(1)" and on send sets nonblocking "ZMQ.DONTWAIT" -- python-side receive has "set_hwm(0)"

Takeaways here:

it is possible to get ZMQ to perform very well even with jupyter notebooks.
There might be some simple socket config that needs to be set
Problem could be elsewhere but need to dive into pycromanager implementation details

code:

import zmq
from py4j.java_gateway import JavaGateway
import numpy as np
import matplotlib.pyplot as plt
import time

gate = JavaGateway()
gate.entry_point.clearAll()
ep = gate.entry_point
mm = ep.getStudio()
mmc = ep.getCMMCore()

zmq_context = zmq.Context()
zmq_socket = zmq_context.socket(zmq.PULL)
zmq_socket.connect("tcp://localhost:5500")
zmq_socket.set_hwm(0)

henrypinkard commented 4 years ago

Nice! The only other thing that comes to mind is zmq version (I'm using pyzmq 18.1.1).

I haven't put any effort into to optimization socket settings, so it would make sense that there are some non-ideal things there. Yes, it does convert everything to JSON. Everything is routed through here on the python side: https://github.com/micro-manager/pycro-manager/blob/86bf51aa996900aa9634b61de732461fdcdf7c60/pycromanager/core.py#L76

On Java side its here: https://github.com/micro-manager/pycro-manager/blob/86bf51aa996900aa9634b61de732461fdcdf7c60/java/src/main/java/org/micromanager/internal/zmq/ZMQServer.java#L124

Every command is a request and a reply. e.g. request is run method, reply is return result.

You could try tracking it down running from source, or tell me the socket settings you use and I can push them into next version

bryantChhun commented 4 years ago

You could try tracking it down running from source, or tell me the socket settings you use and I can push them into next version

Thanks for the file locations. I might try to test from source and get back to you.

bryantChhun commented 4 years ago

I ran a test using the same type of timing but with the Acquisition Engine.

from pycromanager import Acquisition, multi_d_acquisition_events

times = []
def img_process_fn(image, metadata):
    times.append(time.time())
    return image, metadata

events = multi_d_acquisition_events(num_time_points = 500)

with Acquisition(directory='/Users/bryant.chhun/Desktop/Data/untitled folder', 
                 name='speed_test', 
                 image_process_fn=img_process_fn) as acq:
    acq.acquire(events)

which returns much better performance, but still a curious periodic spike:

oeway commented 4 years ago

@henrypinkard A quick question, when you send the image from java to python, will you also convert that to JSON? Can you send the binary data directly?

henrypinkard commented 4 years ago

right now it puts it into json by just serializing binary data into a UTF-8 string. Probably there are better ways of doing this. I think some kind of IPC shared memory thing could be quite a bit faster

henrypinkard commented 3 years ago

@bryantChhun did you ever get a chance to try this out from source and see if there are any obvious differences from your faster implementation? I think @nicost is running into something similar now

bryantChhun commented 3 years ago

@bryantChhun did you ever get a chance to try this out from source and see if there are any obvious differences from your faster implementation? I think @nicost is running into something similar now

@henrypinkard I did make the modifications and test it a long time ago ... from what I recall there wasn't a difference (and hence why no PR!). The change I tried to implement was to switch from request-reply to push-pull with HWM set to 1. I'm going to pull it up again and look a little more, will let you know.

I also recall trying to bypass json serialization for the data only, sending bytestream directly, and that also didn't make a difference -- though, that leaves open the possibility that the lag happens at the command-parsing level, not data streaming-level.

henrypinkard commented 3 years ago

I took a look at this, and at least part (and maybe most) of the delay seems to be coming from JSON serialization. The library currently used parses every character of long strings and adds in escape characters where needed. Its possible that the slowdown is dependent on the value of the message (ie how many escapes needed). In any case, I think there's probably a way around it using a different JSON library or maybe manually implementing the json serialization

oeway commented 3 years ago

Hi @henrypinkard Glad that you find the issue.

A while ago I asked you how you encode binary data, and you said they are encoded as UTF-8 string. I did some quick research on this and found that UTF-8 has >25% overhead when encoding binary data, even worse than base64. (ref).

In that case, I think you should aim for sending them in byte mode rather than UTF-8.

We have been doing this in imjoy-rpc with the Jupyter ZMQ transport layer:

In Jupyter notebook, it also uses ZMQ, more specifically, zmq.send_multipart (see the code here).

In Jupyter notebook, they made a comm message protocol which as the interface comm.send(json_object, buffers=[.. binary_data...]).

Here is how a message is sent in imjoy-rpc:

encode python object into a dictionary (i.e. json types + bytes), this is the encode function
split the json compatible part and the binary parts with the remove_buffers function
send the json part and the binary part separately through jupyter comm via zmq_socket. send_multipart

And the receiving part is similar:

receive via zmq_socket. recv_multipart and we will get the json part and the binary part
assemble them into an encoded object with the put_buffers function
decode the python object with the decode function

The advantage of this is that we have very minimal overhead for encoding large amount of binary data. And we have full list of common types (including primitive types and nd-arrays), see here.

I listed this just for your reference, just in case you ended up with implementing your own. Meanwhile, I am actually interested in having a Java implementation of imjoy-rpc (for connecting imjoy to imagej for example), and we can potentially share the same encoding/decoding scheme.

henrypinkard commented 3 years ago

Thanks for the ideas--I think I will do exactly what you suggest. They do ultimately send as byte arrays regardless of how you encode them, but you're right that UTF-8 is not as efficient. I actually switched to ISO_8859_1 encoding a few weeks ago because of some problems arising from values >127, but I don't think this will matter anymore If I switch to sending buffers on their own, because all the messages will be pure ASCII text.

For making the java part work with imjoy-rpc, I think it would be great if the code could be reused. It might be worth splitting out this common rpc code into its own library altogether (with Java + python communication + javascript if you wanted to add it). What do you think?

oeway commented 3 years ago

Thanks for the ideas--I think I will do exactly what you suggest.

Sounds great!

For making the java part work with imjoy-rpc, I think it would be great if the code could be reused. It might be worth splitting out this common rpc code into its own library altogether (with Java + python communication + javascript if you wanted to add it). What do you think?

Definitely, in fact, the imjoy-rpc repo is meant for hosting the common rpc code. The code was decoupled from imjoy a while ago except we still keep imjoy in the name.

As you can see in the repo, it has already python, javascript, and we can add a java folder. The transport layer is also decoupled from the rpc part, so we can easily add a ZMQ transport (we call it connection, see here). The connection interface is extremely simple, we just need to implement the emit function for sending data, and when we receive data, we call self._fire(event_type, data).

Not sure if you should go for the entire imjoy-rpc handshaking protocol, but at least, I think we can share the same encoding/decoding and ZMQ transport layer.

Perhaps for longer term, if we aim for a full imjoy-rpc implementation from within micro-manager (the java part), we will be able to connect to micro-manager directly from a web browser. The thing is, we will then be able to build control software with, not only Python, but also C/C++, C#, Rust, Julia etc. compiled with WebAssembly. The performance is only 20% lower than native but we have much better portability, reproducibility, usability and stability.

henrypinkard commented 3 years ago

I think this speed issues should be solved in next nightly build. It should be 10-100x faster now

henrypinkard commented 3 years ago

For splitting out into universal communication library, I agree that it would be really cool and useful, but I think it would be a good amount of work to unify our communications protocols, and I don't have the bandwidth for it at the moment. But definitely something to keep in mind for the future!

oeway commented 3 years ago

I think this speed issues should be solved in next nightly build. It should be 10-100x faster now

Awesome! What did you changed?

For splitting out into universal communication library, I agree that it would be really cool and useful, but I think it would be a good amount of work to unify our communications protocols, and I don't have the bandwidth for it at the moment. But definitely something to keep in mind for the future!

yes, let's keep it for future.

henrypinkard commented 3 years ago

I think this is probably solved now with the improvements on transferring byte buffers. Let me know if you find differently

micro-manager / pycro-manager

Snap image is much slower and laggy compared to pymmcore #90