microscope-cockpit / cockpit

Cockpit is a microscope graphical user interface. It is a flexible and easy to extend platform aimed at life scientists using bespoke microscopes.
https://microscope-cockpit.org
GNU General Public License v3.0
37 stars 27 forks source link

OnExit sometimes not called #828

Closed Jakob-We closed 1 year ago

Jakob-We commented 2 years ago

Using a fresh install of cockpit started with python -m cockpit. But replaced the self._proxy.may_move_on_enable() call with True hence https://github.com/MicronOxford/cockpit/issues/827 and added the line:

cockpit.util.logger.log.error("OnExit")

in __init__.py Line 266 to check if this part got executed. Then I started cockpit and closed it using the "x". I did this a couple of times and sometimes in the log file there is a OnExit entry and sometimes not. Operating system: Windows 10 Python version: v3.9.2:1a79785 (I think I encountered this issue on a different version aswell)

carandraug commented 2 years ago

I can't reproduce this in Linux (Python 3.7.3 in Debian 10). Since you're already starting cockpit from command line, is there anything else being printed there when this (not having OnExit showing on the log file)? What's the return code in those cases?

Also, could, could it be that Cockpit exit without closing/flushing the log file? Try adding cockpit.util.logger.log.flush() as well.

carandraug commented 2 years ago

Also, could, could it be that Cockpit exit without closing/flushing the log file? Try adding cockpit.util.logger.log.flush() as well.

Seems that flush is a method for the handler. So try this instead:

for handler in cockpit.util.logger.log
    handler.flush()

and add logging.shutdown() to the end of OnExit method.

Jakob-We commented 2 years ago

Seems that flush is a method for the handler. So try this instead:

for handler in cockpit.util.logger.log
    handler.flush()

When adding this I once got an TypeError: 'RootLogger' object is not iterable so I changed it to:

for handler in cockpit.util.logger.log.handlers:
handler.flush()
cockpit.util.logger.log.shutdown()

and add logging.shutdown() to the end of OnExit method.

Adding that gives a NameError: name 'logging' is not defined so I changed it as above.

The errors where printed to the console.

But the problem is since then I never could get the "OnExit" line in the log file even after starting cockpit over 20 times. Even restarted my PC in between. I also run cockpit with the --debut flag, I will attach that log file: 20220928_Wed-0711.log And tried a few times to start cockpit with those lines removed again, but still didnt got the "OnExit".

I can't reproduce this in Linux (Python 3.7.3 in Debian 10). Since you're already starting cockpit from command line, is there anything else being printed there when this (not having OnExit showing on the log file)? What's the return code in those cases?

If the debug flag is not set the log file is empty, but I tried to add a log to the OnInit method and that then shows up. The thing is I became aware of this problem because my own device onExit() was only run sometimes, so I setup a new venv and installed the newest dev version of cockpit (cloned it from github) and tried to recreate that problem. About the return code I am not sure if thats the correct way to access it on windows but what I did was run echo %errorlevel% after cockpit finished and that returned -1073741819.

and add logging.shutdown() to the end of OnExit method.

juliomateoslangerak commented 1 year ago

I've being trying to debug the onExit calls on the executor and I cannot manage to see if it is executed. I tried in different places

None of this pints to an execution of the onExit methods.

Is there a way to test this further?

iandobbie commented 1 year ago

Hi Julio,

I have no idea but have run into similar problems trying to fix the seg fault on exit on the Mac. We should maybe discus it next week at the development meeting (Tuesday 9th May)

carandraug commented 1 year ago

None of this pints to an execution of the onExit methods.

Is there a way to test this further?

At that stage, to check if OnExit is being called, I would just create an empty file somewhere (that you know you will have the right permissions).

juliomateoslangerak commented 1 year ago

I did so and an empty file is created at the call of CockpitApp.OnInit but not at either CockpitApp.OnExit, niCRIOFPGA.onExit or microscopeDevice.MicroscopeBase.onExit

iandobbie commented 1 year ago

How about trapping the wx.EVT_CLOSE and see if that is triggered?

carandraug commented 1 year ago

Could it be we're doing something silly on MainWindow.OnClose that prevents CockpitApp.OnExit from being called properly? Maybe comment out MainWindow.OnClose and its binding to wx.EVT_CLOSE and see. If commenting that out, means that CockpitApp.OnExit is called, maybe we're doing something wrong (like we're manually destroying the windows and I forget why).

juliomateoslangerak commented 1 year ago

I commented that out but no changes. No files are created and cockpit closes

without that commented out, OnClose is called by the wx._core.CloseEvent

iandobbie commented 1 year ago

Julio apparently solved his immediate issue with a function to close the open network connection bound to delete on the connection object for the FPGA executor.

carandraug commented 1 year ago

Without Windows, I can't reproduce this (python 3.7 and wxPython 4.1.1).

With this minimal example, is OnExit called for you?

import wx

class TestMainWindow(wx.Frame):
    def __init__(self):
        super().__init__(parent=None)
        self.Bind(wx.EVT_CLOSE, self.OnClose)

    def OnClose(self, event):
        print("TestMainWindow.OnClose() called")
        if event.CanVeto():
            print("TestMainWindow.OnClose() skipping event")
            event.Skip()
        else:
            print("TestMainWindow.OnClose() not skipping event")

class TestApp(wx.App):
    def OnInit(self):
        main_window = TestMainWindow()
        main_window.Show()
        return True

    def OnExit(self):
        print("TestApp.OnExit() called")
        return super().OnExit()

app = TestApp()
app.MainLoop()
juliomateoslangerak commented 1 year ago

That works fine. Same PC same environment

C:\Users\Julio\PycharmProjects\cockpit\venvs\venv39\Scripts\python.exe C:\Users\Julio\PycharmProjects\cockpit\cockpit\test.py 
TestMainWindow.OnClose() called
TestMainWindow.OnClose() skipping event
TestApp.OnExit() called

Process finished with exit code 0
carandraug commented 1 year ago

I've got a windows machine to test this. It seems, from my testing, that the issue lies on the handlers watch. If I make DeviceHandler.addWatch do nothing:

--- a/cockpit/handlers/deviceHandler.py
+++ b/cockpit/handlers/deviceHandler.py
@@ -221,9 +221,7 @@ class DeviceHandler:

     ## Add a watch on a device parameter.
     def addWatch(self, name, callback):
-        if name not in self._watches:
-            self._watches[name] = set()
-        self._watches[name].add(callback)
+        return

     ## A function that any control can call to toggle enabled/disabled state.

Then OnExit is called (other errors appear then, namely #684).

I think what's happening is that in these watches there are references to wx objects which need to be destroyed before App.OnExit is called.

iandobbie commented 1 year ago

I have been looking at the code and trying to think about replacing it.

It sets entires in a _watches dict with a name and a callback. If any value is set on the device object it checks its name, if it matches the name in the _watches dict it call the callback with the value.

This is totally unlike the wx.Timer code as it only fires if the value is set, no need to repeatably be calling a timer. I think we need the wx objects (buttons, display strings etc) which utilise this to just remove themselves from the _watchers list when they are destoryed.

--- a/cockpit/handlers/deviceHandler.py
+++ b/cockpit/handlers/deviceHandler.py
@@ -221,9 +221,7 @@ class DeviceHandler:

     ## Add a watch on a device parameter.
     def addWatch(self, name, callback):
-        if name not in self._watches:
-            self._watches[name] = set()
-        self._watches[name].add(callback)
+        return

We should note this was applied to the codebase in commit 5265b38

iandobbie commented 1 year ago

I have been thinking about this some. The watcher is actually a much more elegant solution than the wx.timers etc. It is on the attribute setter and stores a set of callbacks on any attribute for the handler. This means we have no timers or anything going on.

See like the solution is two-fold

1) add a removeWatch function, something like def removeWatch(self, name, callback): self._watches[name].remove(callback) probably needs some sanity checking but you get the idea

2) Ensure that the removeWatch is called as the wx objects get destroyed, this might be the hard part.

iandobbie commented 1 year ago

Ok, I have started to create a proper solution to this. Just removing the watches means the GUI isnt updated which is the wrong solution.

I have added a removeWatch function to the device handler and started to add bindings to the EVT_WINDOW_DESTROY to call the removeWatch function. This appears to work for the mainpanel.py calls. I will work on the others.

iandobbie commented 1 year ago

I have dealt with all the addWatch occurrences and so long as you DON'T use the mosaic then closing the program returns "0". However if you use the mosaic it appears to crash on exit, and takes out the python interpreter.

You can turn on and off cameras, lights etc... Change Laser power, snap images, use live image mode, even upload the image to the mosaic from the right click menu. You can also click around in the mosaic and move the stage, uploading images at different place. However if you run the mosaic at all it will crash.

iandobbie commented 1 year ago

Reproduce this by:

ID@iic-id-air cockpit % python3
Python 3.11.4 (main, Jun  7 2023, 00:34:23) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cockpit
>>> cockpit.main(["cockpit'])

just close cockpit and you get

>>> import cockpit
>>> cockpit.main(["cockpit"])
0
>>> 

If you enable a camera, switch to mosaic window and start the mosaic, stop it then quit...

>>> import cockpit
>>> cockpit.main(["cockpit"])
ID@iic-id-air cockpit %

I guess we need to investigate the mosaic thread and what it is running.

iandobbie commented 1 year ago

I have tried to setup a thread event to kill the mosaicLoop thread but not succeeded. I think I am doing something wrong but I am not sure what. I think that this thread is crashing and taking out the python interpreter on close after the mosaic has been run at all.

carandraug commented 1 year ago

We should note this was applied to the codebase in commit https://github.com/microscope-cockpit/cockpit/commit/5265b38238e313d5626182c072ef4d50434d6e8b

Ups. This was leftovers from when I was trying to find the root cause of this issue. I've put the original code back in. The DeviceHandler.addWatch code was originally added in e70481d4da0719218abc

carandraug commented 1 year ago

I have been thinking about this some. The watcher is actually a much more elegant solution than the wx.timers etc. It is on the attribute setter and stores a set of callbacks on any attribute for the handler. This means we have no timers or anything going on.

But then the watcher is only triggered when an attribute is set on the handler object. It's not suitable when we don't know if something changed on the device (or is the device handler checking itself regularly?).

I think in terms of book keeping, in the case of watches we need to keep the callback around and remove it when an window is destroyed, while in the case of wx timers we need to keep the timers around to stop them when a window is destroyed.

In terms of performance I have no sense which one is better.

I wonder what is better in terms of control such as stopping them when running experiments for example.

iandobbie commented 1 year ago

But then the watcher is only triggered when an attribute is set on the handler object. It's not suitable when we don't know if something changed on the device (or is the device handler checking itself regularly?).

I think these two do very different jobs. The watcher allows the interface to update, possibly multiple displays easily and efficiently. The wx timer is better to follow things that might have external changes so we need to check on it regularly.

I think in terms of book keeping, in the case of watches we need to keep the callback around and remove it when an window is destroyed, while in the case of wx timers we need to keep the timers around to stop them when a window is destroyed.

Yes, absolutely. I have to branch that does this for all watchers and wx.Timers. I think this solves all these issues and should be merged.

In terms of performance I have no sense which one is better.

I would say it depends.... We have both in the code and they both now appear to work, even on shutdown.

However, there is still another issue which I am pretty sure is the mosaic thread. If you run the mosaic at all it crashes the python interpreter on exit. I have tried to set a threading.Event to cause the mosaicLoop to terminate but currently I don't quite have the right magic and it isn't propagating to actually stop the thread.

My fixed to date are in https://github.com/iandobbie/cockpit/tree/addWatch-fix

iandobbie commented 1 year ago

Turns out the mosaic was paused midloop with the threading.Event.wait() call. Once this was also set to restart the loop my kill code worked and it appears as though cockpit shutdown cleanly.

iandobbie commented 1 year ago

These fixes also fix the issue I was having with cockpit-distribution not closing the deviceserver processes, because cockpit was not shutting down cleanly.

Ironically python_sim.py from cockpit-distribution now segfaults every time, but closes much more cleanly as it does stop the device server process. Still seems wx related, here is part of an example dump

Crashed Thread: 0 Dispatch queue: com.apple.main-thread

Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000010 Exception Codes: 0x0000000000000001, 0x0000000000000010 Exception Note: EXC_CORPSE_NOTIFY

Termination Reason: Namespace SIGNAL, Code 11 Segmentation fault: 11 Terminating Process: exc handler [21017]

VM Region Info: 0x10 is not in any region. Bytes before following region: 105553518919664 REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL UNUSED SPACE AT START --->
MALLOC_NANO (reserved) 600018000000-600020000000 [128.0M] rw-/rwx SM=NUL ...(unallocated)

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 Python 0x1016f3880 method_dealloc + 128 1 _core.cpython-311-darwin.so 0x1195f8b58 wxPyCallback::~wxPyCallback() + 124 2 _core.cpython-311-darwin.so 0x1195f8b58 wxPyCallback::~wxPyCallback() + 124 3 _core.cpython-311-darwin.so 0x1195f8c28 wxPyCallback::~wxPyCallback() + 12 4 libwx_baseu-3.2.0.0.0.dylib 0x11a3e9ce0 wxEvtHandler::~wxEvtHandler() + 332 5 _core.cpython-311-darwin.so 0x1195f965c sipwxEvtHandler::~sipwxEvtHandler() + 56 6 _core.cpython-311-darwin.so 0x1195f9a1c dealloc_wxEvtHandler(_sipSimpleWrapper*) + 152 7 siplib.cpython-311-darwin.so 0x1180bb008 forgetObject + 204 8 siplib.cpython-311-darwin.so 0x1180babd8 sipWrapper_dealloc + 44 9 Python 0x101763164 subtype_dealloc + 520 10 Python 0x1016f38e0 method_dealloc + 224 11 Python 0x10171a054 list_dealloc + 192

iandobbie commented 1 year ago

This is a major win for closing cleanly. The edits are rather minor but a bit all over the place. Broadly they consist of

1) binding to wx.EVT_WINDOW_DESTROY

2) add a removeWatch call in the deviceHandler

3) In the destroy event handler call the removeWatch on the relevant objects

4) Kill the moasicLoop thread

5) several objects had to be moved in the self. namespace so we could have links to them and call the relevant removeWatch.

6) create some additional functions as the old lambda functions couldn't easily be removed from the watch list.

If there are no complaints I will pull these changes into master

carandraug commented 1 year ago

The mosaic fix to stop the mosaic thread makes it segfault in Linux :/ I don't understand why and will investigate

iandobbie commented 1 year ago

Ok, thanks for testing. I wont push this to master until I hear more. I guess I should setup something as a linux box to test these things. I'll see if I have a Pi around that could do it.

iandobbie commented 1 year ago

Just occurred to me it might just be a threading issue that the threading event is fired but the scheduler doesn't get to the other thread, making the threading event itself cause the crash, especially as there are now two threading events, the Continue event and the Stop event.

I have no idea how python deals with this but in C the classic approach is to insert a sleep(0) the indicate to the scheduler you are ready to relinquish execution.

carandraug commented 1 year ago

The problem with the mosaic was that during CockpitApp.OnExit we published an USER_ABORT event and there were a few subscribers for that event on the mosaic window. Those subscriptions are handled by cockpit.gui.EvtEmitter which works by binding itself to the window destroy event. When the window is destroyed, EvtEmitter unsubscribes from the event. However, now we are handling that event in the window itself so it was never reaching the EvtEmitter. The solution was to call event.Skip() which means that handling of the event should continue down/up the chain. I think the way we use events, we should always be calling Skip(). This was not needed for the other cases where you're calling OnDestroy but I added it anyway so it will work in the future.

Anway, I squashed some of your commits into two and added the Skip I mentioned, and pushed the fix. Finally closing this issue :)

As an aside, with the addWatch/removeWatch in the device handlers, we've effectively re-implemented the pub-sub thing. But we have a cockpit.events.Publisher class which we could have used instead and has a few more safeguards in place.

iandobbie commented 1 year ago

Good catch and I did think about the .Skip() but decided that as it was in the onDestroy we didn't need it. I guess it pays to be pedantic.

As an aside, with the addWatch/removeWatch in the device handlers, we've effectively re-implemented the pub-sub thing. But we have a cockpit.events.Publisher class which we could have used instead and has a few more safeguards in place.

Yes, the one different I see is that having it in the value setter the calling routine doesn't need to know to publish an event, just needs to set a new value. We could of course do the events.publish in the attribute setter as well.

carandraug commented 1 year ago

As an aside, with the addWatch/removeWatch in the device handlers, we've effectively re-implemented the pub-sub thing. But we have a cockpit.events.Publisher class which we could have used instead and has a few more safeguards in place.

Yes, the one different I see is that having it in the value setter the calling routine doesn't need to know to publish an event, just needs to set a new value. We could of course do the events.publish in the attribute setter as well.

I didn't mean using events.publish (which uses the global Publisher). I meant having a separate instance of Publisher per device handler. See carandraug/cockpit@e6b885dbf0947683ba794c6d753582d for what I meant.

The point is that we already have a pub-sub implementation in cockpit.events.Publisher and the watches in device handlers are another pub-sub implementation, just slightly different. The differences are that watches use a set instead of list for the callbacks (which prevented subscribing multiple times with the same callback), and that Publisher logs errors on that callback (instead of silently discarding them), handles errors when unsubscribing, and has locks when subscribing and unsubscribing. So I see only pros in using Publisher. Could you test my 828-replace-watch-with-publisher branch ?

iandobbie commented 1 year ago

Will test on a mac here, can also do a PC.

On a related note your latest fix for the mosaic seems to fix the cockpit-distribution seg fault issue, well done!

iandobbie commented 1 year ago

Mostly works but I can reliably cause an issue by closing while the mosaic is running on windows. The output I get is:

>>> cockpit.main(["cockpit"])
Traceback (most recent call last):
  File "C:\Users\idobbie1\AppData\Local\Programs\Python\Python310\lib\site-packages\wx\core.py", line 3427, in <lambda>
    lambda event: event.callable(*event.args, **event.kw) )
RuntimeError: wrapped C/C++ object of type MosaicCanvas has been deleted
Traceback (most recent call last):
  File "C:\Users\idobbie1\AppData\Local\Programs\Python\Python310\lib\site-packages\wx\core.py", line 3427, in <lambda>
    lambda event: event.callable(*event.args, **event.kw) )
  File "C:\Users\idobbie1\src\cockpit\cockpit\gui\touchscreen.py", line 942, in Refresh
    super().Refresh(*args, **kwargs)
RuntimeError: wrapped C/C++ object of type MosaicPanel has been deleted
0
>>>

Although this is producing an error, you can see that the process still returns 0 at exit. I can't get this failure on the mac. In general I don't think this is worth fixing as if you just run cockpit this output is swallowed by a sub process somewhere. You only get this if you explicitly import cockpit and run cockpit.main from the python rpel

carandraug commented 1 year ago

Mostly works but I can reliably cause an issue by closing while the mosaic is running on windows.

I'll guess this is because the mosaic panels are somewhat shared between the touchscreen window and the mosaic window.

In general I don't think this is worth fixing [...]

We never know what other issues are lying beneath this but I've fixed enough. Also that probably is worth a separate issue. This one is closed :)