Closed Jakob-We closed 1 year ago
I can't reproduce this in Linux (Python 3.7.3 in Debian 10). Since you're already starting cockpit from command line, is there anything else being printed there when this (not having OnExit
showing on the log file)? What's the return code in those cases?
Also, could, could it be that Cockpit exit without closing/flushing the log file? Try adding cockpit.util.logger.log.flush()
as well.
Also, could, could it be that Cockpit exit without closing/flushing the log file? Try adding
cockpit.util.logger.log.flush()
as well.
Seems that flush
is a method for the handler. So try this instead:
for handler in cockpit.util.logger.log
handler.flush()
and add logging.shutdown()
to the end of OnExit
method.
Seems that
flush
is a method for the handler. So try this instead:for handler in cockpit.util.logger.log handler.flush()
When adding this I once got an
TypeError: 'RootLogger' object is not iterable
so I changed it to:for handler in cockpit.util.logger.log.handlers: handler.flush() cockpit.util.logger.log.shutdown()
and add
logging.shutdown()
to the end ofOnExit
method.
Adding that gives a NameError: name 'logging' is not defined
so I changed it as above.
The errors where printed to the console.
But the problem is since then I never could get the "OnExit" line in the log file even after starting cockpit over 20 times. Even restarted my PC in between.
I also run cockpit with the --debut
flag, I will attach that log file:
20220928_Wed-0711.log
And tried a few times to start cockpit with those lines removed again, but still didnt got the "OnExit".
I can't reproduce this in Linux (Python 3.7.3 in Debian 10). Since you're already starting cockpit from command line, is there anything else being printed there when this (not having
OnExit
showing on the log file)? What's the return code in those cases?
If the debug flag is not set the log file is empty, but I tried to add a log to the OnInit method and that then shows up.
The thing is I became aware of this problem because my own device onExit() was only run sometimes, so I setup a new venv and installed the newest dev version of cockpit (cloned it from github) and tried to recreate that problem.
About the return code I am not sure if thats the correct way to access it on windows but what I did was run echo %errorlevel%
after cockpit finished and that returned -1073741819.
and add
logging.shutdown()
to the end ofOnExit
method.
I've being trying to debug the onExit calls on the executor and I cannot manage to see if it is executed. I tried in different places
None of this pints to an execution of the onExit methods.
Is there a way to test this further?
Hi Julio,
I have no idea but have run into similar problems trying to fix the seg fault on exit on the Mac. We should maybe discus it next week at the development meeting (Tuesday 9th May)
None of this pints to an execution of the onExit methods.
Is there a way to test this further?
At that stage, to check if OnExit is being called, I would just create an empty file somewhere (that you know you will have the right permissions).
I did so and an empty file is created at the call of CockpitApp.OnInit but not at either CockpitApp.OnExit, niCRIOFPGA.onExit or microscopeDevice.MicroscopeBase.onExit
How about trapping the wx.EVT_CLOSE and see if that is triggered?
Could it be we're doing something silly on MainWindow.OnClose
that prevents CockpitApp.OnExit
from being called properly? Maybe comment out MainWindow.OnClose
and its binding to wx.EVT_CLOSE
and see. If commenting that out, means that CockpitApp.OnExit
is called, maybe we're doing something wrong (like we're manually destroying the windows and I forget why).
I commented that out but no changes. No files are created and cockpit closes
without that commented out, OnClose is called by the wx._core.CloseEvent
Julio apparently solved his immediate issue with a function to close the open network connection bound to delete on the connection object for the FPGA executor.
Without Windows, I can't reproduce this (python 3.7 and wxPython 4.1.1).
With this minimal example, is OnExit
called for you?
import wx
class TestMainWindow(wx.Frame):
def __init__(self):
super().__init__(parent=None)
self.Bind(wx.EVT_CLOSE, self.OnClose)
def OnClose(self, event):
print("TestMainWindow.OnClose() called")
if event.CanVeto():
print("TestMainWindow.OnClose() skipping event")
event.Skip()
else:
print("TestMainWindow.OnClose() not skipping event")
class TestApp(wx.App):
def OnInit(self):
main_window = TestMainWindow()
main_window.Show()
return True
def OnExit(self):
print("TestApp.OnExit() called")
return super().OnExit()
app = TestApp()
app.MainLoop()
That works fine. Same PC same environment
C:\Users\Julio\PycharmProjects\cockpit\venvs\venv39\Scripts\python.exe C:\Users\Julio\PycharmProjects\cockpit\cockpit\test.py
TestMainWindow.OnClose() called
TestMainWindow.OnClose() skipping event
TestApp.OnExit() called
Process finished with exit code 0
I've got a windows machine to test this. It seems, from my testing, that the issue lies on the handlers watch. If I make DeviceHandler.addWatch
do nothing:
--- a/cockpit/handlers/deviceHandler.py
+++ b/cockpit/handlers/deviceHandler.py
@@ -221,9 +221,7 @@ class DeviceHandler:
## Add a watch on a device parameter.
def addWatch(self, name, callback):
- if name not in self._watches:
- self._watches[name] = set()
- self._watches[name].add(callback)
+ return
## A function that any control can call to toggle enabled/disabled state.
Then OnExit
is called (other errors appear then, namely #684).
I think what's happening is that in these watches there are references to wx objects which need to be destroyed before App.OnExit
is called.
I have been looking at the code and trying to think about replacing it.
It sets entires in a _watches dict with a name and a callback. If any value is set on the device object it checks its name, if it matches the name in the _watches dict it call the callback with the value.
This is totally unlike the wx.Timer code as it only fires if the value is set, no need to repeatably be calling a timer. I think we need the wx objects (buttons, display strings etc) which utilise this to just remove themselves from the _watchers list when they are destoryed.
--- a/cockpit/handlers/deviceHandler.py +++ b/cockpit/handlers/deviceHandler.py @@ -221,9 +221,7 @@ class DeviceHandler: ## Add a watch on a device parameter. def addWatch(self, name, callback): - if name not in self._watches: - self._watches[name] = set() - self._watches[name].add(callback) + return
We should note this was applied to the codebase in commit 5265b38
I have been thinking about this some. The watcher is actually a much more elegant solution than the wx.timers etc. It is on the attribute setter and stores a set of callbacks on any attribute for the handler. This means we have no timers or anything going on.
See like the solution is two-fold
1) add a removeWatch function, something like
def removeWatch(self, name, callback): self._watches[name].remove(callback)
probably needs some sanity checking but you get the idea
2) Ensure that the removeWatch is called as the wx objects get destroyed, this might be the hard part.
Ok, I have started to create a proper solution to this. Just removing the watches means the GUI isnt updated which is the wrong solution.
I have added a removeWatch function to the device handler and started to add bindings to the EVT_WINDOW_DESTROY to call the removeWatch function. This appears to work for the mainpanel.py calls. I will work on the others.
I have dealt with all the addWatch occurrences and so long as you DON'T use the mosaic then closing the program returns "0". However if you use the mosaic it appears to crash on exit, and takes out the python interpreter.
You can turn on and off cameras, lights etc... Change Laser power, snap images, use live image mode, even upload the image to the mosaic from the right click menu. You can also click around in the mosaic and move the stage, uploading images at different place. However if you run the mosaic at all it will crash.
Reproduce this by:
ID@iic-id-air cockpit % python3
Python 3.11.4 (main, Jun 7 2023, 00:34:23) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cockpit
>>> cockpit.main(["cockpit'])
just close cockpit and you get
>>> import cockpit
>>> cockpit.main(["cockpit"])
0
>>>
If you enable a camera, switch to mosaic window and start the mosaic, stop it then quit...
>>> import cockpit
>>> cockpit.main(["cockpit"])
ID@iic-id-air cockpit %
I guess we need to investigate the mosaic thread and what it is running.
I have tried to setup a thread event to kill the mosaicLoop thread but not succeeded. I think I am doing something wrong but I am not sure what. I think that this thread is crashing and taking out the python interpreter on close after the mosaic has been run at all.
We should note this was applied to the codebase in commit https://github.com/microscope-cockpit/cockpit/commit/5265b38238e313d5626182c072ef4d50434d6e8b
Ups. This was leftovers from when I was trying to find the root cause of this issue. I've put the original code back in. The DeviceHandler.addWatch
code was originally added in e70481d4da0719218abc
I have been thinking about this some. The watcher is actually a much more elegant solution than the wx.timers etc. It is on the attribute setter and stores a set of callbacks on any attribute for the handler. This means we have no timers or anything going on.
But then the watcher is only triggered when an attribute is set on the handler object. It's not suitable when we don't know if something changed on the device (or is the device handler checking itself regularly?).
I think in terms of book keeping, in the case of watches we need to keep the callback around and remove it when an window is destroyed, while in the case of wx timers we need to keep the timers around to stop them when a window is destroyed.
In terms of performance I have no sense which one is better.
I wonder what is better in terms of control such as stopping them when running experiments for example.
But then the watcher is only triggered when an attribute is set on the handler object. It's not suitable when we don't know if something changed on the device (or is the device handler checking itself regularly?).
I think these two do very different jobs. The watcher allows the interface to update, possibly multiple displays easily and efficiently. The wx timer is better to follow things that might have external changes so we need to check on it regularly.
I think in terms of book keeping, in the case of watches we need to keep the callback around and remove it when an window is destroyed, while in the case of wx timers we need to keep the timers around to stop them when a window is destroyed.
Yes, absolutely. I have to branch that does this for all watchers and wx.Timers. I think this solves all these issues and should be merged.
In terms of performance I have no sense which one is better.
I would say it depends.... We have both in the code and they both now appear to work, even on shutdown.
However, there is still another issue which I am pretty sure is the mosaic thread. If you run the mosaic at all it crashes the python interpreter on exit. I have tried to set a threading.Event to cause the mosaicLoop to terminate but currently I don't quite have the right magic and it isn't propagating to actually stop the thread.
My fixed to date are in https://github.com/iandobbie/cockpit/tree/addWatch-fix
Turns out the mosaic was paused midloop with the threading.Event.wait() call. Once this was also set to restart the loop my kill code worked and it appears as though cockpit shutdown cleanly.
These fixes also fix the issue I was having with cockpit-distribution not closing the deviceserver processes, because cockpit was not shutting down cleanly.
Ironically python_sim.py from cockpit-distribution now segfaults every time, but closes much more cleanly as it does stop the device server process. Still seems wx related, here is part of an example dump
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000010 Exception Codes: 0x0000000000000001, 0x0000000000000010 Exception Note: EXC_CORPSE_NOTIFY
Termination Reason: Namespace SIGNAL, Code 11 Segmentation fault: 11 Terminating Process: exc handler [21017]
VM Region Info: 0x10 is not in any region. Bytes before following region: 105553518919664
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
UNUSED SPACE AT START
--->
MALLOC_NANO (reserved) 600018000000-600020000000 [128.0M] rw-/rwx SM=NUL ...(unallocated)
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 Python 0x1016f3880 method_dealloc + 128 1 _core.cpython-311-darwin.so 0x1195f8b58 wxPyCallback::~wxPyCallback() + 124 2 _core.cpython-311-darwin.so 0x1195f8b58 wxPyCallback::~wxPyCallback() + 124 3 _core.cpython-311-darwin.so 0x1195f8c28 wxPyCallback::~wxPyCallback() + 12 4 libwx_baseu-3.2.0.0.0.dylib 0x11a3e9ce0 wxEvtHandler::~wxEvtHandler() + 332 5 _core.cpython-311-darwin.so 0x1195f965c sipwxEvtHandler::~sipwxEvtHandler() + 56 6 _core.cpython-311-darwin.so 0x1195f9a1c dealloc_wxEvtHandler(_sipSimpleWrapper*) + 152 7 siplib.cpython-311-darwin.so 0x1180bb008 forgetObject + 204 8 siplib.cpython-311-darwin.so 0x1180babd8 sipWrapper_dealloc + 44 9 Python 0x101763164 subtype_dealloc + 520 10 Python 0x1016f38e0 method_dealloc + 224 11 Python 0x10171a054 list_dealloc + 192
This is a major win for closing cleanly. The edits are rather minor but a bit all over the place. Broadly they consist of
1) binding to wx.EVT_WINDOW_DESTROY
2) add a removeWatch call in the deviceHandler
3) In the destroy event handler call the removeWatch on the relevant objects
4) Kill the moasicLoop thread
5) several objects had to be moved in the self. namespace so we could have links to them and call the relevant removeWatch.
6) create some additional functions as the old lambda functions couldn't easily be removed from the watch list.
If there are no complaints I will pull these changes into master
The mosaic fix to stop the mosaic thread makes it segfault in Linux :/ I don't understand why and will investigate
Ok, thanks for testing. I wont push this to master until I hear more. I guess I should setup something as a linux box to test these things. I'll see if I have a Pi around that could do it.
Just occurred to me it might just be a threading issue that the threading event is fired but the scheduler doesn't get to the other thread, making the threading event itself cause the crash, especially as there are now two threading events, the Continue event and the Stop event.
I have no idea how python deals with this but in C the classic approach is to insert a sleep(0) the indicate to the scheduler you are ready to relinquish execution.
The problem with the mosaic was that during CockpitApp.OnExit
we published an USER_ABORT
event and there were a few subscribers for that event on the mosaic window. Those subscriptions are handled by cockpit.gui.EvtEmitter
which works by binding itself to the window destroy event. When the window is destroyed, EvtEmitter
unsubscribes from the event. However, now we are handling that event in the window itself so it was never reaching the EvtEmitter
. The solution was to call event.Skip()
which means that handling of the event should continue down/up the chain. I think the way we use events, we should always be calling Skip().
This was not needed for the other cases where you're calling OnDestroy
but I added it anyway so it will work in the future.
Anway, I squashed some of your commits into two and added the Skip
I mentioned, and pushed the fix. Finally closing this issue :)
As an aside, with the addWatch/removeWatch
in the device handlers, we've effectively re-implemented the pub-sub thing. But we have a cockpit.events.Publisher
class which we could have used instead and has a few more safeguards in place.
Good catch and I did think about the .Skip() but decided that as it was in the onDestroy we didn't need it. I guess it pays to be pedantic.
As an aside, with the
addWatch/removeWatch
in the device handlers, we've effectively re-implemented the pub-sub thing. But we have acockpit.events.Publisher
class which we could have used instead and has a few more safeguards in place.
Yes, the one different I see is that having it in the value setter the calling routine doesn't need to know to publish an event, just needs to set a new value. We could of course do the events.publish in the attribute setter as well.
As an aside, with the addWatch/removeWatch in the device handlers, we've effectively re-implemented the pub-sub thing. But we have a cockpit.events.Publisher class which we could have used instead and has a few more safeguards in place.
Yes, the one different I see is that having it in the value setter the calling routine doesn't need to know to publish an event, just needs to set a new value. We could of course do the events.publish in the attribute setter as well.
I didn't mean using events.publish
(which uses the global Publisher
). I meant having a separate instance of Publisher
per device handler. See carandraug/cockpit@e6b885dbf0947683ba794c6d753582d for what I meant.
The point is that we already have a pub-sub implementation in cockpit.events.Publisher
and the watches in device handlers are another pub-sub implementation, just slightly different. The differences are that watches use a set
instead of list
for the callbacks (which prevented subscribing multiple times with the same callback), and that Publisher
logs errors on that callback (instead of silently discarding them), handles errors when unsubscribing, and has locks when subscribing and unsubscribing. So I see only pros in using Publisher
. Could you test my 828-replace-watch-with-publisher
branch ?
Will test on a mac here, can also do a PC.
On a related note your latest fix for the mosaic seems to fix the cockpit-distribution seg fault issue, well done!
Mostly works but I can reliably cause an issue by closing while the mosaic is running on windows. The output I get is:
>>> cockpit.main(["cockpit"])
Traceback (most recent call last):
File "C:\Users\idobbie1\AppData\Local\Programs\Python\Python310\lib\site-packages\wx\core.py", line 3427, in <lambda>
lambda event: event.callable(*event.args, **event.kw) )
RuntimeError: wrapped C/C++ object of type MosaicCanvas has been deleted
Traceback (most recent call last):
File "C:\Users\idobbie1\AppData\Local\Programs\Python\Python310\lib\site-packages\wx\core.py", line 3427, in <lambda>
lambda event: event.callable(*event.args, **event.kw) )
File "C:\Users\idobbie1\src\cockpit\cockpit\gui\touchscreen.py", line 942, in Refresh
super().Refresh(*args, **kwargs)
RuntimeError: wrapped C/C++ object of type MosaicPanel has been deleted
0
>>>
Although this is producing an error, you can see that the process still returns 0 at exit. I can't get this failure on the mac. In general I don't think this is worth fixing as if you just run cockpit this output is swallowed by a sub process somewhere. You only get this if you explicitly import cockpit and run cockpit.main from the python rpel
Mostly works but I can reliably cause an issue by closing while the mosaic is running on windows.
I'll guess this is because the mosaic panels are somewhat shared between the touchscreen window and the mosaic window.
In general I don't think this is worth fixing [...]
We never know what other issues are lying beneath this but I've fixed enough. Also that probably is worth a separate issue. This one is closed :)
Using a fresh install of cockpit started with
python -m cockpit
. But replaced the self._proxy.may_move_on_enable() call with True hence https://github.com/MicronOxford/cockpit/issues/827 and added the line:in
__init__.py
Line 266 to check if this part got executed. Then I started cockpit and closed it using the "x". I did this a couple of times and sometimes in the log file there is aOnExit
entry and sometimes not. Operating system: Windows 10 Python version: v3.9.2:1a79785 (I think I encountered this issue on a different version aswell)