Closed philipstarkey closed 5 years ago
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).
Thanks for working out how to reproduce the issue! I'll see if I can get it to happen on one of my machines - once I can get it to lock up I should be able to fairly easily work out what the culprit is via using the faulthandler to dump tracebacks of all threads, or something like that.
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).
I was not able to reproduce this, with frame rates close to 180fps. I'll try again on a different computer later.
However, I did find a deadlock if one clicks 'restart' for the tab while acquiring at high frame rates. So I have fixed that (PR #77). It doesn't seem like the same issue though, even though this one sounds like a deadlock of some sort too.
If you are able to reliably reproduce the problem, then I can tell you what I would be doing if I could reproduce it, and maybe you can show me the resulting tracebacks and we might be able to figure out what's going on.
I would add at the top of blacs/__main__.py
:
import faulthandler
f = open(r'C:\labscript_suite\blacs\faulthandler.log', 'a')
f.write('\n\n---restart---\n\n')
faulthandler.dump_traceback_later(timeout=60, repeat=True, file=f)
Then, once the hang occurs, wait a full 60 seconds (or wait less if you decrease the timeout for dumping tracebacks), then quit BLACS. I don't know if it will flush the output file. If it doesn't and Python exits in an unclean way, the output file may not contain the final set of tracebacks. In which case I would switch it to file=sys.stderr
and rely on reading it from the terminal, but you may need to increase the number of scrollback lines in the terminal settings, otherwise you won't be able to see all the tracebacks - it outputs one traceback per thread, and there are likely to be tens of threads, depending on how many devices you have.
Original comment by David Meyer (Bitbucket: dihm, GitHub: dihm).
OK, so I've made some progress. I have a traceback, which I'll just e-mail to you directly since it is pretty long.
More to the point, I've discovered another important step to reproduce the bug. It seems it only happens if I first load BLACS, change something in the worker, save, then restart the tab. Could it be something with the cache updating?
Original comment by David Meyer (Bitbucket: dihm, GitHub: dihm).
We are hitting this issue again in the lab so I've taken some more time to characterize what is happening with added logging of some of the relevant players.
For starters, whatever is going on seems to be just intermittent enough that I can't reliably pin down exactly what steps are needed to reproduce in all cases. I can attest that we will often hit a long term steady-state where this is consistently an issue, particularly with the PylonCamera
device, when the fps gets above ~25. For the logs below, I started BLACS, took a snap, started continuous, waited a couple seconds until the fps stabilized, then hit stop. After a few seconds Windows would recognize the GUI as not responding. In this instance, images appear to update correctly until I hit stop.
In order to better debug, I've added logging messages into IMAQdxCameraWorker.continuous_loop()
and a bailout check where once 1000 images have been taken it adjusts dt down to 1/15 which I've found breaks the deadlock and allows me to get proper control back. I've also added logging messages to ImageReceiver.handler()
in the IMAQdx blacs tab:
logger.info('ImageReceiver Processing')
if self.update_event is not None:
while True:
# Don't block, and check for self.stopping regularly in case we
are
# shutting down. Otherwise if shutdown is called from the main
thread we
# would deadlock.
try:
self.update_event.get(timeout=0.1)
logger.info('Update event cleared')
break
except Empty:
if self.stopping:
return
logger.info(' Waiting for previous update!')
self.update_event = inmain_later(self.update, image, self.frame_rate)
return [b'ok']
With these changes, I now have three logs: the normal BLACS log, one following the ImageReceiver of the BLACS tab, and one following the BLACS worker. Relevant snippets are below. It appears that the BLACS.__main__
loop is getting hung up but the worker is not getting blocked to slow down the flood of images. This particular snippet of BLACS log is unusual in that there is some activity between the continuous start and the main loop hang, probably because I got lucky and started the acquisition just as the check_status jobs went out. Note that the SMF100A has a statemachine_timeout of 5 seconds. There is an NI_DAQ and two other cameras connected that don't appear in this snippet. The hang is seen when only the AnalysisSubmission loop is logging events and lasts for about a minute.
Checking the task manager on this, the blacs worker thread barely changes but the __main__
thread explodes, eating 50% of the processor time and the memory footprint ballooning from 170MB to nearly 500MB until the continuous loop is brought to heel.
At this point I'm a bit stuck as to what to do next. The next step in the chain is qtutils.inmain_later
, which I don't understand very well. Unless someone has another thought, could I get a brief explanation of it and/or a pointer to where my next level of debug logging needs to be to track down this hang?
BLACS.log
2019-09-06 10:00:40,681 INFO BLACS.AnalysisSubmission.mainloop: Processed signal: check/retry
2019-09-06 10:00:40,704 DEBUG BLACS.CCD_2.mainloop: Processing event _start_continuous
2019-09-06 10:00:40,708 DEBUG BLACS.CCD_2.mainloop: Instructing worker main_worker to do job start_continuous
2019-09-06 10:00:40,713 DEBUG BLACS.CCD_2.mainloop: Waiting for worker to acknowledge job request
2019-09-06 10:00:40,714 DEBUG BLACS.CCD_2.mainloop: Worker reported job started, waiting for completion
2019-09-06 10:00:40,742 DEBUG BLACS.CCD_2.mainloop: Job completed
2019-09-06 10:00:40,743 DEBUG BLACS.CCD_2.mainloop: returning worker results to function _start_continuous
2019-09-06 10:00:40,748 DEBUG BLACS.CCD_2.mainloop: Finalising function
2019-09-06 10:00:40,751 DEBUG BLACS.CCD_2.mainloop: Waiting for next event
2019-09-06 10:00:40,772 DEBUG BLACS.pulseblaster_0.mainloop: Processing event _status_monitor
2019-09-06 10:00:40,776 DEBUG BLACS.pulseblaster_0.mainloop: Instructing worker main_worker to do job check_status
2019-09-06 10:00:40,777 DEBUG BLACS.pulseblaster_0_main_worker.worker: Got job request check_status
2019-09-06 10:00:40,777 DEBUG BLACS.pulseblaster_0_main_worker.worker: Starting job check_status
2019-09-06 10:00:40,778 DEBUG BLACS.pulseblaster_0_main_worker.worker: Job complete
2019-09-06 10:00:40,778 DEBUG BLACS.pulseblaster_0.mainloop: Waiting for worker to acknowledge job request
2019-09-06 10:00:40,778 DEBUG BLACS.pulseblaster_0_main_worker.worker: Waiting for next job request
2019-09-06 10:00:40,781 DEBUG BLACS.pulseblaster_0.mainloop: Worker reported job started, waiting for completion
2019-09-06 10:00:40,782 DEBUG BLACS.pulseblaster_0.mainloop: Job completed
2019-09-06 10:00:40,783 DEBUG BLACS.pulseblaster_0.mainloop: returning worker results to function _status_monitor
2019-09-06 10:00:40,830 DEBUG BLACS.pulseblaster_0.mainloop: Finalising function
2019-09-06 10:00:40,872 DEBUG BLACS.pulseblaster_0.mainloop: Waiting for next event
2019-09-06 10:00:40,960 DEBUG BLACS.pulseblaster_0.mainloop: Processing event __timeout_add
2019-09-06 10:00:41,146 DEBUG BLACS.pulseblaster_0.mainloop: Waiting for next event
2019-09-06 10:00:47,502 DEBUG BLACS.SMF100A.mainloop: Processing event _status_monitor
2019-09-06 10:00:47,506 DEBUG BLACS.pulseblaster_0.mainloop: Processing event _status_monitor
2019-09-06 10:00:47,662 DEBUG BLACS.SMF100A.mainloop: Instructing worker main_worker to do job check_status
2019-09-06 10:00:47,665 DEBUG BLACS.SMF100A_main_worker.worker: Got job request check_status
2019-09-06 10:00:47,666 DEBUG BLACS.SMF100A_main_worker.worker: Starting job check_status
2019-09-06 10:00:47,665 DEBUG BLACS.pulseblaster_0.mainloop: Instructing worker main_worker to do job check_status
2019-09-06 10:00:47,667 DEBUG BLACS.SMF100A_main_worker.worker: Job complete
2019-09-06 10:00:47,668 DEBUG BLACS.SMF100A_main_worker.worker: Waiting for next job request
2019-09-06 10:00:47,668 DEBUG BLACS.pulseblaster_0_main_worker.worker: Got job request check_status
2019-09-06 10:00:47,669 DEBUG BLACS.pulseblaster_0_main_worker.worker: Starting job check_status
2019-09-06 10:00:47,669 DEBUG BLACS.pulseblaster_0_main_worker.worker: Job complete
2019-09-06 10:00:47,669 DEBUG BLACS.pulseblaster_0_main_worker.worker: Waiting for next job request
2019-09-06 10:00:47,742 DEBUG BLACS.SMF100A.mainloop: Waiting for worker to acknowledge job request
2019-09-06 10:00:47,745 DEBUG BLACS.pulseblaster_0.mainloop: Waiting for worker to acknowledge job request
2019-09-06 10:00:47,747 DEBUG BLACS.SMF100A.mainloop: Worker reported job started, waiting for completion
2019-09-06 10:00:47,750 DEBUG BLACS.pulseblaster_0.mainloop: Worker reported job started, waiting for completion
2019-09-06 10:00:47,752 DEBUG BLACS.SMF100A.mainloop: Job completed
2019-09-06 10:00:47,753 DEBUG BLACS.pulseblaster_0.mainloop: Job completed
2019-09-06 10:00:47,753 DEBUG BLACS.SMF100A.mainloop: returning worker results to function _status_monitor
2019-09-06 10:00:47,753 DEBUG BLACS.pulseblaster_0.mainloop: returning worker results to function _status_monitor
2019-09-06 10:00:47,878 DEBUG BLACS.SMF100A.mainloop: Finalising function
2019-09-06 10:00:47,881 DEBUG BLACS.pulseblaster_0.mainloop: Finalising function
2019-09-06 10:00:47,964 DEBUG BLACS.SMF100A.mainloop: Waiting for next event
2019-09-06 10:00:47,966 DEBUG BLACS.pulseblaster_0.mainloop: Waiting for next event
2019-09-06 10:00:48,128 DEBUG BLACS.SMF100A.mainloop: Processing event __timeout_add
2019-09-06 10:00:48,129 DEBUG BLACS.pulseblaster_0.mainloop: Processing event __timeout_add
2019-09-06 10:00:48,337 DEBUG BLACS.SMF100A.mainloop: Waiting for next event
2019-09-06 10:00:48,339 DEBUG BLACS.pulseblaster_0.mainloop: Waiting for next event
2019-09-06 10:00:50,838 INFO BLACS.AnalysisSubmission.mainloop: Processed signal: check/retry
2019-09-06 10:01:00,994 INFO BLACS.AnalysisSubmission.mainloop: Processed signal: check/retry
2019-09-06 10:01:11,153 INFO BLACS.AnalysisSubmission.mainloop: Processed signal: check/retry
2019-09-06 10:01:19,017 DEBUG BLACS.CCD_2.mainloop: Processing event _stop_continuous
2019-09-06 10:01:19,017 DEBUG BLACS.pulseblaster_0.mainloop: Processing event _status_monitor
2019-09-06 10:01:19,017 DEBUG BLACS.SMF100A.mainloop: Processing event _status_monitor
2019-09-06 10:01:19,017 DEBUG BLACS.novatech.mainloop: Processing event _check_remote_values
2019-09-06 10:01:19,018 DEBUG BLACS.novatech_static.mainloop: Processing event _check_remote_values
2019-09-06 10:01:19,088 DEBUG BLACS.CCD_2.mainloop: Instructing worker main_worker to do job stop_continuous
BLACS_tab log
2019-09-06 10:00:40,780 INFO Camera_Tab: ImageReceiver Processing
2019-09-06 10:00:40,781 INFO Camera_Tab: Update event cleared
2019-09-06 10:00:40,812 INFO Camera_Tab: ImageReceiver Processing
2019-09-06 10:00:40,812 INFO Camera_Tab: Update event cleared
2019-09-06 10:00:40,843 INFO Camera_Tab: ImageReceiver Processing
2019-09-06 10:00:40,843 INFO Camera_Tab: Update event cleared
2019-09-06 10:00:40,874 INFO Camera_Tab: ImageReceiver Processing
2019-09-06 10:00:40,875 INFO Camera_Tab: Update event cleared
<lots of repeats without any gaps in time>
2019-09-06 10:01:00,137 INFO Camera_Tab: ImageReceiver Processing
2019-09-06 10:01:00,161 INFO Camera_Tab: Update event cleared
2019-09-06 10:01:00,175 INFO Camera_Tab: ImageReceiver Processing
2019-09-06 10:01:00,199 INFO Camera_Tab: Update event cleared
2019-09-06 10:01:00,213 INFO Camera_Tab: ImageReceiver Processing
2019-09-06 10:01:00,236 INFO Camera_Tab: Update event cleared
2019-09-06 10:01:00,250 INFO Camera_Tab: ImageReceiver Processing
2019-09-06 10:01:00,274 INFO Camera_Tab: Update event cleared
2019-09-06 10:01:00,287 INFO Camera_Tab: ImageReceiver Processing
2019-09-06 10:01:00,310 INFO Camera_Tab: Update event cleared
<more repeats without gaps>
2019-09-06 10:01:18,937 INFO Camera_Tab: ImageReceiver Processing
2019-09-06 10:01:18,940 INFO Camera_Tab: Update event cleared
2019-09-06 10:01:19,003 INFO Camera_Tab: ImageReceiver Processing
2019-09-06 10:01:19,004 INFO Camera_Tab: Update event cleared
2019-09-06 10:01:19,081 INFO Camera_Tab: ImageReceiver Processing
2019-09-06 10:01:19,082 INFO Camera_Tab: Update event cleared
BLACS_worker log
2019-09-06 10:00:32,921 DEBUG Camera_Worker_PylonCameraWorker: Job complete
2019-09-06 10:00:32,922 DEBUG Camera_Worker_PylonCameraWorker: Waiting for next job request
2019-09-06 10:00:32,958 DEBUG Camera_Worker_PylonCameraWorker: Got job request program_manual
2019-09-06 10:00:32,958 DEBUG Camera_Worker_PylonCameraWorker: Starting job program_manual
2019-09-06 10:00:32,959 DEBUG Camera_Worker_PylonCameraWorker: Job complete
2019-09-06 10:00:32,959 DEBUG Camera_Worker_PylonCameraWorker: Waiting for next job request
2019-09-06 10:00:39,562 DEBUG Camera_Worker_PylonCameraWorker: Got job request snap
2019-09-06 10:00:39,563 DEBUG Camera_Worker_PylonCameraWorker: Starting job snap
2019-09-06 10:00:39,719 DEBUG Camera_Worker_PylonCameraWorker: Job complete
2019-09-06 10:00:39,719 DEBUG Camera_Worker_PylonCameraWorker: Waiting for next job request
2019-09-06 10:00:40,709 DEBUG Camera_Worker_PylonCameraWorker: Got job request start_continuous
2019-09-06 10:00:40,709 DEBUG Camera_Worker_PylonCameraWorker: Starting job start_continuous
2019-09-06 10:00:40,728 INFO Camera_Worker_PylonCameraWorker: Grabbing 0th image
2019-09-06 10:00:40,732 INFO Camera_Worker_PylonCameraWorker: Starting Continuous
2019-09-06 10:00:40,741 DEBUG Camera_Worker_PylonCameraWorker: Job complete
2019-09-06 10:00:40,741 DEBUG Camera_Worker_PylonCameraWorker: Waiting for next job request
2019-09-06 10:00:40,770 INFO Camera_Worker_PylonCameraWorker: Sending 0th image to parent
2019-09-06 10:00:40,783 INFO Camera_Worker_PylonCameraWorker: Grabbing 1th image
2019-09-06 10:00:40,801 INFO Camera_Worker_PylonCameraWorker: Sending 1th image to parent
2019-09-06 10:00:40,814 INFO Camera_Worker_PylonCameraWorker: Grabbing 2th image
2019-09-06 10:00:40,833 INFO Camera_Worker_PylonCameraWorker: Sending 2th image to parent
2019-09-06 10:00:40,844 INFO Camera_Worker_PylonCameraWorker: Grabbing 3th image
2019-09-06 10:00:40,864 INFO Camera_Worker_PylonCameraWorker: Sending 3th image to parent
2019-09-06 10:00:40,878 INFO Camera_Worker_PylonCameraWorker: Grabbing 4th image
2019-09-06 10:00:40,896 INFO Camera_Worker_PylonCameraWorker: Sending 4th image to parent
<lot's of lines without any gaps in time>
2019-09-06 10:01:00,124 INFO Camera_Worker_PylonCameraWorker: Grabbing 507th image
2019-09-06 10:01:00,126 INFO Camera_Worker_PylonCameraWorker: Sending 507th image to parent
2019-09-06 10:01:00,162 INFO Camera_Worker_PylonCameraWorker: Grabbing 508th image
2019-09-06 10:01:00,164 INFO Camera_Worker_PylonCameraWorker: Sending 508th image to parent
2019-09-06 10:01:00,201 INFO Camera_Worker_PylonCameraWorker: Grabbing 509th image
2019-09-06 10:01:00,202 INFO Camera_Worker_PylonCameraWorker: Sending 509th image to parent
2019-09-06 10:01:00,238 INFO Camera_Worker_PylonCameraWorker: Grabbing 510th image
2019-09-06 10:01:00,240 INFO Camera_Worker_PylonCameraWorker: Sending 510th image to parent
2019-09-06 10:01:00,275 INFO Camera_Worker_PylonCameraWorker: Grabbing 511th image
2019-09-06 10:01:00,277 INFO Camera_Worker_PylonCameraWorker: Sending 511th image to parent
<more lines without gaps>
2019-09-06 10:01:18,670 INFO Camera_Worker_PylonCameraWorker: Grabbing 994th image
2019-09-06 10:01:18,672 INFO Camera_Worker_PylonCameraWorker: Sending 994th image to parent
2019-09-06 10:01:18,708 INFO Camera_Worker_PylonCameraWorker: Grabbing 995th image
2019-09-06 10:01:18,710 INFO Camera_Worker_PylonCameraWorker: Sending 995th image to parent
2019-09-06 10:01:18,746 INFO Camera_Worker_PylonCameraWorker: Grabbing 996th image
2019-09-06 10:01:18,748 INFO Camera_Worker_PylonCameraWorker: Sending 996th image to parent
2019-09-06 10:01:18,784 INFO Camera_Worker_PylonCameraWorker: Grabbing 997th image
2019-09-06 10:01:18,785 INFO Camera_Worker_PylonCameraWorker: Sending 997th image to parent
2019-09-06 10:01:18,820 INFO Camera_Worker_PylonCameraWorker: Grabbing 998th image
2019-09-06 10:01:18,822 INFO Camera_Worker_PylonCameraWorker: Sending 998th image to parent
2019-09-06 10:01:18,857 INFO Camera_Worker_PylonCameraWorker: Grabbing 999th image
2019-09-06 10:01:18,859 INFO Camera_Worker_PylonCameraWorker: Sending 999th image to parent
2019-09-06 10:01:18,894 ERROR Camera_Worker_PylonCameraWorker: Adjusting dt from 0.000000 to 0.066667
2019-09-06 10:01:18,924 INFO Camera_Worker_PylonCameraWorker: Grabbing 1000th image
2019-09-06 10:01:18,926 INFO Camera_Worker_PylonCameraWorker: Sending 1000th image to parent
2019-09-06 10:01:18,990 INFO Camera_Worker_PylonCameraWorker: Grabbing 1001th image
2019-09-06 10:01:18,992 INFO Camera_Worker_PylonCameraWorker: Sending 1001th image to parent
2019-09-06 10:01:19,069 INFO Camera_Worker_PylonCameraWorker: Grabbing 1002th image
2019-09-06 10:01:19,071 INFO Camera_Worker_PylonCameraWorker: Sending 1002th image to parent
2019-09-06 10:01:19,090 DEBUG Camera_Worker_PylonCameraWorker: Got job request stop_continuous
2019-09-06 10:01:19,090 DEBUG Camera_Worker_PylonCameraWorker: Starting job stop_continuous
2019-09-06 10:01:19,090 INFO Camera_Worker_PylonCameraWorker: Stopping Continuous
2019-09-06 10:01:19,134 DEBUG Camera_Worker_PylonCameraWorker: Job complete
2019-09-06 10:01:19,134 DEBUG Camera_Worker_PylonCameraWorker: Waiting for next job request
2019-09-06 10:01:26,250 DEBUG Camera_Worker_PylonCameraWorker: Got job request shutdown
2019-09-06 10:01:26,250 DEBUG Camera_Worker_PylonCameraWorker: Starting job shutdown
2019-09-06 10:01:26,251 DEBUG Camera_Worker_PylonCameraWorker: Job complete
2019-09-06 10:01:26,251 DEBUG Camera_Worker_PylonCameraWorker: Waiting for next job request
Edit: fixed wierdness in formatting and truncation due to replying by e-mail
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).
Sorry again for not seeing this, I think you must be correct that if you use the email response feature, bitbucket sends no notifications. Oh well, we'll be on github soon enough :p.
These results give some leads!
inmain_later
queues up a function call to occur in the main thread, and returns a queue.Queue
object that the result of the function call (or an exception) will be put to once the function in the main thread runs. It does this by posting a custom Qt event (qtutils.invoke_in_main.CallEvent
) that holds the function, its args, and the Queue
that the response should be posted to. A custom event handler (qtutils.invoke_in_main.Caller
) runs in the main thread and handles the event by calling the function and putting the result to the Queue
. So each call to self.update_event.get(timeout=0.1)
is waiting for the previously requested function to run in the main thread, and this is supposed to prevent a backlog. If we just called inmain_later
without waiting for the result at any point, we definitely would get a backlog there.
The fact that your logging keeps saying "Update event cleared" implies that the update()
function is running and looks like it keeping up. This makes me think the backlog is downstream from that, for example, perhaps image_view.setImage()
queues up something to happen later, and is not fully synchronous.
To confirm this, you could:
self.image_view.setImage(image.T, autoRange=False, autoLevels=False)
. If the backlog is upstream of this, it should still occur. If the backlog is downstream of this, it might stop occurring.ImageReceiver.update()
synchonous by replacing inmain_later with inmain:diff -r 57c4e81e56704e2cf7eb418b5bf80d48a7d8cc82 IMAQdxCamera/blacs_tabs.py
--- a/IMAQdxCamera/blacs_tabs.py Mon Aug 26 16:13:28 2019 -0400
+++ b/IMAQdxCamera/blacs_tabs.py Sat Sep 14 16:37:52 2019 -0400
@@ -22,7 +22,7 @@
import numpy as np
-from qtutils import UiLoader, inmain_later
+from qtutils import UiLoader, inmain_later, inmain
import qtutils.icons
from qtutils.qt import QtWidgets, QtGui, QtCore
import pyqtgraph as pg
@@ -71,19 +71,7 @@
else:
self.frame_rate = 1 / dt
self.last_frame_time = this_frame_time
- # Wait for the previous update to compete so we don't accumulate a backlog:
- if self.update_event is not None:
- while True:
- # Don't block, and check for self.stopping regularly in case we are
- # shutting down. Otherwise if shutdown is called from the main thread we
- # would deadlock.
- try:
- self.update_event.get(timeout=0.1)
- break
- except Empty:
- if self.stopping:
- return
- self.update_event = inmain_later(self.update, image, self.frame_rate)
+ inmain(self.update, image, self.frame_rate)
return [b'ok']
def update(self, image, frame_rate):
This will wait for the update()
function to run in the main thread before continuing. If the issue is downstream in the pyqtgraph method calls, then it might still occur. However, this change may decrease the attainable frame rate since it delays sending 'ok' back to the sender, so it might make the problem go away solely due to the decreased frame rate and not actually fixing any underlying problem.
Another thing that might hint at where the backlog is is turning on our Python memory profiler, which saves lists of how many of each type of object there is in the interpreter. It can be used as follows:
import labscript_utils.memprof as memprof
# Do this once to get a baseline of number of objects of each
# type - ideally this should be called during normal usage, i.e. not at
# BLACS startup, but before memory has ballooned. Maybe have it
# called the first time ImageReceiver.handler() runs for example.
memprof.start(filepath='<some_filepath>')
# Arrange to call this when the memory has ballooned,
# for example after 1000 images if that is a reliable
# number for reproducing the bug:
memprof.check()
The result will then be a file at start()
and check()
calls.
There is also the BLACS memory
plugin that adds menu entries to BLACS that will call memprof's start and check functions, but since the GUI won't be functioning at that point I think the plugin won't be much use!
If the objects in memfprof's output file are all qtutils CallEvent
s and numpy arrays of image data, that would be evidence that our posting of events to the main thread might be the culprit (though I think the evidence is mostly against that explanation). If the objects are all things from pyqtgraph, then perhaps pyqtgraph is delaying updates of its own and a backlog is accumulating there.
I think when I tried to reproduce the bug I was not running the latest pyqtgraph from github, whereas I think you were. Maybe something has changed and I should have another go at reproducing the problem.
Original comment by David Meyer (Bitbucket: dihm, GitHub: dihm).
OK, scratched out some time to look at this again. Note that I've updated labscript_devices to include the most recent IMAQdxCamera updates. And while I have installed pyqtgraph from github recently, I apparently went back to the conda version 0.10 at some point so all these following tests are with the old version.
Results:
I also took a look with memprof. I did this by initializing at the end of the first call to ImageReciever.handler() then calling check at different iterations of the loop (1,200,500). I believe both the 200 and 500 checks were done while the GUI was completely hung (ie windows recognizes BLACS at not responding). Even the first loop creates quite a bit of noise, but compared with the 200th loop there are a few standouts I can see. First, there are around 40 extra 'managedBuffer' and 'memoryview' class instances. Second, the number of tuples, lists and dicts seems to fluctuate pretty wildly depending on when the check happens. Next, the only meaningful count for pyqtgraph objects is 'pygtgraph.Point.Point'. Finally, while there are plenty of QT5 and SIP objects, they seem pretty consistent no matter which loop I'm looking at.
I also have a memprof when using inmain instead of inmain_later at 100. It is similar to the other loops, though the managedBuffer and memoryview counts are down to 17. That could be coincidental though.
I’m probably going to start looking at pyqtgraph imageSet and see if something jumps out at me. As for reproducing the problem with the Mock camera, you may need to increase the pixel count from 500x500 as well as speeding up the loop. My full frame grabs are 1280x960x16bit at the moment and I suspect the image size may be contributing to the issue.
Here is the top memprof contributors from the first loop:
<class 'sip.methoddescriptor'> 8716
<class 'PyQt5.QtCore.QLocale.Language'> 375
<class 'PyQt5.QtCore.QLocale.Country'> 269
<class 'function'> 232
<class 'tuple'> 230
<class 'sip.variabledescriptor'> 207
<class 'dict'> 169
<class 'PyQt5.QtCore.QLocale.Script'> 144
<class 'PyQt5.QtGui.QPageSize.PageSizeId'> 127
<class 'PyQt5.QtGui.QOpenGLTexture.TextureFormat'> 126
<class 'PyQt5.QtGui.QPagedPaintDevice.PageSize'> 126
<class 'weakref'> 89
<class 'PyQt5.QtCore.QMetaType.Type'> 78
<class 'PyQt5.QtGui.QKeySequence.StandardKey'> 71
<class 'PyQt5.QtCore.QVariant.Type'> 59
<class 'getset_descriptor'> 58
<class 'PyQt5.QtCore.QEasingCurve.Type'> 48
<class 'type'> 29
<class 'cell'> 29
<class 'PyQt5.QtCore.QSysInfo.WinVersion'> 29
<class 'PyQt5.QtWidgets.QMessageBox.StandardButton'> 27
<class 'PyQt5.QtCore.QDataStream.Version'> 26
<class 'PyQt5.QtGui.QOpenGLTexture.PixelType'> 26
<class 'PyQt5.QtOpenGL.QGL.FormatOption'> 22
<class 'PyQt5.QtWidgets.QScrollerProperties.ScrollMetric'> 21
<class 'PyQt5.QtOpenGL.QGLFormat.OpenGLVersionFlag'> 21
<class 'PyQt5.QtCore.QDir.Filter'> 20
<class 'PyQt5.QtCore.QStandardPaths.StandardLocation'> 20
<class 'PyQt5.QtGui.QPaintEngine.PaintEngineFeature'> 20
<class 'PyQt5.QtGui.QPaintEngine.Type'> 20
<class 'PyQt5.QtWidgets.QTreeWidgetItemIterator.IteratorFlag'> 20
<class 'PyQt5.QtGui.QImageIOHandler.ImageOption'> 19
<class 'PyQt5.QtGui.QOpenGLTexture.PixelFormat'> 19
<class 'PyQt5.QtWidgets.QDialogButtonBox.StandardButton'> 19
<class 'PyQt5.QtWidgets.QWizard.WizardOption'> 17
<class 'PyQt5.QtGui.QOpenGLTexture.Feature'> 16
<class 'PyQt5.QtGui.QPixelFormat.YUVLayout'> 16
<class 'PyQt5.QtCore.QCryptographicHash.Algorithm'> 15
<class 'PyQt5.QtCore.QFileDevice.FileError'> 15
<class 'PyQt5.QtCore.QJsonParseError.ParseError'> 15
<class 'PyQt5.QtCore.QLibraryInfo.LibraryLocation'> 15
<class 'PyQt5.QtGui.QPaintEngine.DirtyFlag'> 14
<class 'PyQt5.QtCore.QUrl.UrlFormattingOption'> 13
<class 'PyQt5.QtCore.QDir.SortFlag'> 12
<class 'PyQt5.QtCore.QFileDevice.Permission'> 12
<class 'PyQt5.QtCore.QXmlStreamReader.TokenType'> 11
<class 'PyQt5.QtGui.QOpenGLDebugMessage.Type'> 11
<class 'PyQt5.QtGui.QOpenGLTexture.BindingTarget'> 11
<class 'PyQt5.QtGui.QOpenGLTexture.Target'> 11
<class 'PyQt5.QtGui.QTextFrameFormat.BorderStyle'> 11
<class 'PyQt5.QtWidgets.QDateTimeEdit.Section'> 11
<class 'PyQt5.QtCore.QRegularExpression.PatternOption'> 10
<class 'PyQt5.QtWidgets.QDialogButtonBox.ButtonRole'> 10
<class 'PyQt5.QtWidgets.QMessageBox.ButtonRole'> 10
<class 'PyQt5.QtWidgets.QWizard.WizardButton'> 10
<class 'PyQt5.QtCore.QSharedMemory.SharedMemoryError'> 9
<class 'PyQt5.QtGui.QOpenGLBuffer.UsagePattern'> 9
<class 'PyQt5.QtGui.QPixelFormat.ColorModel'> 9
<class 'builtin_function_or_method'> 8
<class 'pyqtgraph.Point.Point'> 8
<class 'PyQt5.QtCore.QIODevice.OpenModeFlag'> 8
<class 'PyQt5.QtCore.QUrl.ComponentFormattingOption'> 8
<class 'PyQt5.QtGui.QImageIOHandler.Transformation'> 8
<class 'PyQt5.QtGui.QOpenGLDebugMessage.Source'> 8
<class 'PyQt5.QtGui.QOpenGLTexture.ComparisonFunction'> 8
<class 'PyQt5.QtGui.QTextListFormat.Style'> 8
<class 'PyQt5.QtWidgets.QStyleOptionMenuItem.MenuItemType'> 8
<class 'list'> 7
<class 'PyQt5.QtCore.QJsonValue.Type'> 7
<class 'PyQt5.QtCore.QLocale.NumberOption'> 7
<class 'PyQt5.QtCore.QOperatingSystemVersion.OSType'> 7
<class 'PyQt5.QtCore.QSystemSemaphore.SystemSemaphoreError'> 7
<class 'PyQt5.QtCore.QUuid.Version'> 7
<class 'PyQt5.QtGui.QTabletEvent.TabletDevice'> 7
<class 'PyQt5.QtGui.QTouchDevice.CapabilityFlag'> 7
<class 'PyQt5.QtWidgets.QFileIconProvider.IconType'> 7
Here is the top for the 200th loop:
<class 'sip.methoddescriptor'> 8722
<class 'PyQt5.QtCore.QLocale.Language'> 375
<class 'PyQt5.QtCore.QLocale.Country'> 269
<class 'dict'> 235
<class 'sip.variabledescriptor'> 207
<class 'PyQt5.QtCore.QLocale.Script'> 144
<class 'PyQt5.QtGui.QPageSize.PageSizeId'> 127
<class 'PyQt5.QtGui.QOpenGLTexture.TextureFormat'> 126
<class 'PyQt5.QtGui.QPagedPaintDevice.PageSize'> 126
<class 'tuple'> 109
<class 'PyQt5.QtCore.QMetaType.Type'> 78
<class 'PyQt5.QtGui.QKeySequence.StandardKey'> 71
<class 'list'> 69
<class 'weakref'> 63
<class 'pyqtgraph.Point.Point'> 60
<class 'PyQt5.QtCore.QVariant.Type'> 59
<class 'frame'> 48
<class 'PyQt5.QtCore.QEasingCurve.Type'> 48
<class 'managedbuffer'> 40
<class 'memoryview'> 40
<class 'builtin_function_or_method'> 39
<class 'numpy.core._internal._unsafe_first_element_pointer'> 38
<class 'numpy.core._internal.c_char_Array_0'> 38
<class 'numpy.core._internal.LP_c_char_Array_0'> 38
<class 'traceback'> 36
<class 'PyQt5.QtCore.QSysInfo.WinVersion'> 29
<class 'PyQt5.QtWidgets.QMessageBox.StandardButton'> 27
<class 'PyQt5.QtCore.QDataStream.Version'> 26
<class 'PyQt5.QtGui.QOpenGLTexture.PixelType'> 26
<class 'collections.deque'> 24
<class 'PyQt5.QtOpenGL.QGL.FormatOption'> 22
<class 'PyQt5.QtWidgets.QScrollerProperties.ScrollMetric'> 21
<class 'PyQt5.QtOpenGL.QGLFormat.OpenGLVersionFlag'> 21
<class 'PyQt5.QtCore.QDir.Filter'> 20
<class 'PyQt5.QtCore.QStandardPaths.StandardLocation'> 20
<class 'PyQt5.QtGui.QPaintEngine.PaintEngineFeature'> 20
<class 'PyQt5.QtGui.QPaintEngine.Type'> 20
<class 'PyQt5.QtWidgets.QTreeWidgetItemIterator.IteratorFlag'> 20
<class 'PyQt5.QtGui.QImageIOHandler.ImageOption'> 19
<class 'PyQt5.QtGui.QOpenGLTexture.PixelFormat'> 19
<class 'PyQt5.QtWidgets.QDialogButtonBox.StandardButton'> 19
<class 'threading.Condition'> 18
<class 'PyQt5.QtWidgets.QWizard.WizardOption'> 17
<class 'PyQt5.QtGui.QOpenGLTexture.Feature'> 16
<class 'PyQt5.QtGui.QPixelFormat.YUVLayout'> 16
<class 'PyQt5.QtCore.QCryptographicHash.Algorithm'> 15
<class 'PyQt5.QtCore.QFileDevice.FileError'> 15
<class 'PyQt5.QtCore.QJsonParseError.ParseError'> 15
<class 'PyQt5.QtCore.QLibraryInfo.LibraryLocation'> 15
<class 'PyQt5.QtGui.QPaintEngine.DirtyFlag'> 14
<class 'PyQt5.QtCore.QUrl.UrlFormattingOption'> 13
<class 'PyQt5.QtCore.QDir.SortFlag'> 12
<class 'PyQt5.QtCore.QFileDevice.Permission'> 12
<class 'PyQt5.QtCore.QXmlStreamReader.TokenType'> 11
<class 'PyQt5.QtGui.QOpenGLDebugMessage.Type'> 11
<class 'PyQt5.QtGui.QOpenGLTexture.BindingTarget'> 11
<class 'PyQt5.QtGui.QOpenGLTexture.Target'> 11
<class 'PyQt5.QtGui.QTextFrameFormat.BorderStyle'> 11
<class 'PyQt5.QtWidgets.QDateTimeEdit.Section'> 11
<class 'function'> 10
<class 'PyQt5.QtCore.QRegularExpression.PatternOption'> 10
<class 'PyQt5.QtWidgets.QDialogButtonBox.ButtonRole'> 10
<class 'PyQt5.QtWidgets.QMessageBox.ButtonRole'> 10
<class 'PyQt5.QtWidgets.QWizard.WizardButton'> 10
<class 'PyQt5.QtCore.QSharedMemory.SharedMemoryError'> 9
<class 'PyQt5.QtGui.QOpenGLBuffer.UsagePattern'> 9
<class 'PyQt5.QtGui.QPixelFormat.ColorModel'> 9
<class 'PyQt5.QtCore.QIODevice.OpenModeFlag'> 8
<class 'PyQt5.QtCore.QUrl.ComponentFormattingOption'> 8
<class 'PyQt5.QtGui.QImageIOHandler.Transformation'> 8
<class 'PyQt5.QtGui.QOpenGLDebugMessage.Source'> 8
<class 'PyQt5.QtGui.QOpenGLTexture.ComparisonFunction'> 8
<class 'PyQt5.QtGui.QTextListFormat.Style'> 8
<class 'PyQt5.QtWidgets.QStyleOptionMenuItem.MenuItemType'> 8
<class 'PyQt5.QtCore.QJsonValue.Type'> 7
<class 'PyQt5.QtCore.QLocale.NumberOption'> 7
<class 'PyQt5.QtCore.QOperatingSystemVersion.OSType'> 7
<class 'PyQt5.QtCore.QSystemSemaphore.SystemSemaphoreError'> 7
<class 'PyQt5.QtCore.QUuid.Version'> 7
<class 'PyQt5.QtGui.QTabletEvent.TabletDevice'> 7
<class 'PyQt5.QtGui.QTouchDevice.CapabilityFlag'> 7
<class 'PyQt5.QtWidgets.QFileIconProvider.IconType'> 7
Original comment by David Meyer (Bitbucket: dihm, GitHub: dihm).
Maybe found where the backup is happening. The pyqtgraph.ItemView ultimately uses the QGraphicsItem.update() https://doc.qt.io/qt-5/qgraphicsitem.html#update method which the documentation states "This function does not cause an immediate paint; instead it schedules a paint request that is processed by QGraphicsView after control reaches the event loop. The item will only be redrawn if it is visible in any associated view."
This does seem related. The current hang behavior is such that when continuous is started images update fine, but all GUI interaction is hung. Clicking in the GUI than causes the image update to hang as well until the GUI interaction processes eventually. Once that processes images will update again with the GUI interaction still hung up. Is BLACS just getting crushed by Qt paint update requests?
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).
Increasing the image size does it for me, I am able to reproduce the problem now! On linux and with the latest pyqtgraph even, meaning it's not an issue with your specific environment.
I'll get to the bottom of this!
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).
Regarding the QGraphicsItem.update(), brilliant, I was suspecting something like that, that is definitely the most promising lead. Good find.
There is a bit more to it though - queued events are normally processed in order and so shouldn't be able to build up since we are not allowing our events to build up. However, events also have priorities. So I suspect that the paint events are piling up behind our events, which are being processed with higher priority and so never allowing the backlog of paint events to clear.
Solution could be to make our events lower priority.
Original comment by Philip Starkey (Bitbucket: philipstarkey, GitHub: philipstarkey).
I was also simultaneously investigating along these lines. The relevant Qt code for repainting widgets is here from what I can tell. And it’s linked to monitor refresh rate too, which may explain why it seemed to be fps related. I suggest updating qtutils inmain(_later) to allow an optional priority and to set it low (lower than Qt.LowEventPriority) for those specific update calls.
Alternatively we could have all of our inmain(_later) calls be low priority, but I’m uncertain if that would slow down our code more than we want…
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).
I've just locally added a priority kwarg to inmain_later, and am not seeing it fixing the problem if I set a low priority. Will keep investigating...
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).
I may have spoken too soon about being able to reproduce the problem. I can cause a temporary but large increase in memory by increasing the image size to be very very large (10k x 10k), but it looks like this is just a delay in reclaiming memory until the garbage collector runs, I don't think it has anything to do with Qt. And I can still interact with the GUI, it's only laggy because the painting actually takes a long time for such large images, not because there is any kind of backlog, seemingly. Though, garbage collection being necessary still does seem strange, I would have thought reference counting would get these image array objects, there should not be circular references that would require garbage collection. If I run the garbage collector once per loop the memory growth goes away, but this could merely be because it modifies the timing.
David, could you post the exact Python and PyQt versions you see the problem with? I am also wondering if it has something to do with whether Qt is using OpenGL or not, based on the Qt source Phil linked to.
Original comment by David Meyer (Bitbucket: dihm, GitHub: dihm).
Darn. At this rate we might need to invite you over for a talk so you can actually see it in person.
Here is my conda environment. We are still working on getting Windows 10 upgrades so this is all Windows 7. I don’t believe any package versions have changed more than their micro number, with the exception of zprocess.
# packages in environment at C:\Miniconda3\envs\labscript:
#
# Name Version Build Channel
autocython 1.1.4 pypi_0 pypi
blas 1.0 mkl
ca-certificates 2019.5.15 1
certifi 2019.6.16 py36_1
cycler 0.10.0 py36h009560c_0
cython 0.29.13 py36ha925a31_0
freetype 2.9.1 ha9979f8_1
h5py 2.9.0 py36h5e291fa_0
hdf5 1.10.4 h7ebc959_0
icc_rt 2019.0.0 h0cc432a_1
icu 58.2 ha66f8fd_1
importlib_metadata 0.19 py36_0
intel-openmp 2019.4 245
jpeg 9b hb83a4c4_2
kiwisolver 1.1.0 py36ha925a31_0
libpng 1.6.37 h2a8f88b_0
libsodium 1.0.16 h9d3ae62_0
libtiff 4.0.10 hb898794_2
matplotlib 3.1.1 py36hc8f65d3_0
mkl 2019.4 245
mkl-service 2.0.2 py36he774522_0
mkl_fft 1.0.14 py36h14836fe_0
mkl_random 1.0.2 py36h343c172_0
numpy 1.16.4 py36h19fb1c0_0
numpy-base 1.16.4 py36hc3f5095_0
olefile 0.46 py36_0
openssl 1.1.1c he774522_1
pandas 0.25.1 py36ha925a31_0
pillow 6.1.0 py36hdc69c19_0
pip 19.2.2 py36_0
psutil 5.6.3 py36he774522_0
pycapture2 2.13.61 pypi_0 pypi
pydaqmx 1.4.2 pypi_0 pypi
pyparsing 2.4.2 py_0
pypylon 1.4.0 pypi_0 pypi
pyqt 5.9.2 py36h6538335_2
pyqtgraph 0.10.0 py36h28b3542_3
pyreadline 2.1 py36_1
pyserial 3.4 py36_0
python 3.6.9 h5500b2f_0
python-dateutil 2.8.0 py36_0
pytz 2019.2 py_0
pyvisa 1.10.0 pypi_0 pypi
pywin32 223 py36hfa6e2cd_1
pyzmq 18.1.0 py36ha925a31_0
qt 5.9.7 vc14h73c81de_0
qtutils 2.3.1 pypi_0 pypi
scipy 1.3.1 py36h29ff71c_0
setuptools 41.0.1 py36_0
sip 4.19.8 py36h6538335_0
six 1.12.0 py36_0
spinapi 3.2.0 pypi_0 pypi
sqlite 3.29.0 he774522_0
tk 8.6.8 hfa6e2cd_0
tornado 6.0.3 py36he774522_0
tzlocal 2.0.0 py36_0
vc 14.1 h0510ff6_4
vs2015_runtime 14.16.27012 hf0eaf9b_0
wheel 0.33.4 py36_0
wincertstore 0.2 py36h7fe50ca_0
windows-curses 2.0 pypi_0 pypi
xmlrunner 1.7.7 pypi_0 pypi
xz 5.2.4 h2fa13f4_4
zeromq 4.3.1 h33f27b4_3
zipp 0.5.2 py_0
zlib 1.2.11 h62dcd97_3
zprocess 2.16.0 pypi_0 pypi
zstd 1.3.7 h508b16e_0
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).
Ok, I'm also on a Windows 7 computer with the same versions of everything and not seeing it :/
Then again, this computer is not very fast. I'm struggling to get high frame rates even with fully pre-generated data. Perhaps this bug requires a faster computer to be triggered, maybe something about the screen refresh rate as Phil has mentioned.
So I guess we're still debugging on your setup. Here is attempt at a solution number 1, based on the hypothesis that it's a backlog of paint events that we can prevent by deprioritising our events:
If you replace _inmain_later()
in your locally-installed qtutils/invoke_in_main.py
with the following:
def _in_main_later(fn, exceptions_in_main, *args, **kwargs):
"""Asks the mainloop to call a function when it has time. Immediately returns the
queue that was sent to the mainloop. A call to queue.get() will return a list of
[result,exception] where exception=[type,value,traceback] of the exception. The
keyword argument _priority can be used to pass the event priority to the postEvent()
call. Functions of the same priority are guaranteed to be called in the order they
were requested."""
queue = Queue()
post_kwargs = {}
if '_priority' in kwargs:
post_kwargs['priority'] = kwargs.pop('_priority')
QCoreApplication.postEvent(
caller, CallEvent(queue, exceptions_in_main, fn, *args, **kwargs), **post_kwargs
)
return queue
And change the inmain_later call in blacs_tabs.py to:
self.update_event = inmain_later(self.update, image, self.frame_rate, _priority=QtCore.Qt.LowEventPriority - 1)
does that solve the problem?
Attempt at a solution number two. If you append:
QtGui.QApplication.instance().sendPostedEvents()
to the end of ImageReceiver.update()
, does that solve the problem? This is also based on the hypothesis of a backlog of other events, and we're hoping sendPostedEvents
will process them all immediately. This is pretty hacky but could be informative.
And, attempt at a solution number three. I'm not super convinced that changing the call to inmain()
is a satisfactory solution since all that does is inmain_later()
followed by a .get()
on the resulting queue, so it is basically the same except that the .get()
is immediate instead of just before the next call.
We can't just get rid of waiting on the queue manually since we need to be checking self.stopping
regularly to avoid the possibility of a deadlock upon tab restart, but we could manually .get()
from the queue immediately instead of waiting until just before the next event. Leaving it until the last second was just a way of speeding up the max frame rate slightly by allowing the worker to send the next frame prior to the previous one being rendered.
But it sounds like that will make the problem go away, maybe just for timing reasons, even though I'm not convinced it addresses whatever the root cause is.
So if nothing else presents itself about solving this issue and if I continue to be unable to reproduce it, maybe the following patch is the pragmatic way forward. It's dissatisfying though, and will probably limit the achievable frame rates of other cameras:
diff -r 57c4e81e56704e2cf7eb418b5bf80d48a7d8cc82 IMAQdxCamera/blacs_tabs.py
--- a/IMAQdxCamera/blacs_tabs.py Mon Aug 26 16:13:28 2019 -0400
+++ b/IMAQdxCamera/blacs_tabs.py Fri Sep 20 16:30:57 2019 -0400
@@ -56,7 +56,6 @@
self.label_fps = label_fps
self.last_frame_time = None
self.frame_rate = None
- self.update_event = None
def handler(self, data):
md = json.loads(data[0])
@@ -71,19 +70,18 @@
else:
self.frame_rate = 1 / dt
self.last_frame_time = this_frame_time
- # Wait for the previous update to compete so we don't accumulate a backlog:
- if self.update_event is not None:
- while True:
- # Don't block, and check for self.stopping regularly in case we are
- # shutting down. Otherwise if shutdown is called from the main thread we
- # would deadlock.
- try:
- self.update_event.get(timeout=0.1)
- break
- except Empty:
- if self.stopping:
- return
- self.update_event = inmain_later(self.update, image, self.frame_rate)
+ update_event = inmain_later(self.update, image, self.frame_rate)
+ # Wait for the update to compete so we don't accumulate a backlog:
+ while True:
+ # Don't block, and check for self.stopping regularly in case we are
+ # shutting down. Otherwise if shutdown is called from the main thread we
+ # would deadlock.
+ try:
+ update_event.get(timeout=0.1)
+ break
+ except Empty:
+ if self.stopping:
+ return
return [b'ok']
def update(self, image, frame_rate):
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).
Also, what do you get for the following?
import pyqtgraph
print(pyqtgraph.CONFIG_OPTIONS)
Mostly interested in whether openGL is in use. It seems like for me it is not, but enabling it doesn't cause the bug to occur, either.
Original comment by David Meyer (Bitbucket: dihm, GitHub: dihm).
Results from testing:
Solution number 2) Started here because it was the easiest to implement. This one works. I actually get a slightly higher framerate (28 fps) than using inmain in place of inmain_later.
Solution number 1) This one helps, but the hangs are still present. They are significantly shorter though. Max framerate was 30 fps.
Solution number 3) This also works. Max framerate is 28 fps.
pyqtgraph.CONFIG_OPTIONS = {'useOpenGL':False,'leftButtonPan':True,'foreground':'d','background':'k','antialias':False,'editorCommand':None,'useWeave':False,'weaveDebug':False,'exitCleanup':True,'enableExperimental':False,'crashWarning':False,'imageAxisOrder':'col-major'}
Hopefully it doesn't come to something hacky just our sake since it appears nobody else seems to be having this issue. I'm willing to try a few more things to get to the bottom of this if possible.
Original comment by Philip Starkey (Bitbucket: philipstarkey, GitHub: philipstarkey).
@chrisjbillington FYI @dihm updated this issue and we didn’t get email notifications for it! Writing this message so you see the updates!
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).
Thanks Phil, and thanks David.
So whilst I don't see as obvious a hang, it is clear that user interaction is much laggier for me in some circumstances than others, and it seems to follow the same conditions as you say cause your hang. It goes away for me when I apply the changes that you say solve the issue for you, so I am going to assume they have the same root cause.
Event priorities are not solving the issue for me, and as you said they only seemed to help a little bit for you. It seems to me that they are not sufficient - perhaps the priorities of our events and other events are not even being compared, maybe because we have a custom event handler. Maybe the priorities are only meaningful within a single event handler and not app-wide. I don't know, but I couldn't seem to change much by using event priorities.
So I'm proposing PR #83 as a fix. This is essentially the sendPostedEvents()
fix, though I've gone ahead and merged the handler()
and update()
methods as a simplification - they were only separate in order to send the reply to the worker ASAP, but with a tiny addition to zprocess (update required), the handler()
method can now send a reply immediately and then continue to do its processing, so no need for a separate update()
call to be queued up.
This removes the GUI lagginess for me, and it sounds like it will for you too based on your testing. It still allows the worker to be fetching a new frame whilst the parent process is rendering the previous frame, so we still in principle should be able to render frames at the maximum rate of whichever is slowest: rendering or downloading.
It is usually code smell to be manually calling sendPostedEvents
or similar things, but since we are in the main thread when we call it, and since most events are dispatched immediately in Qt anyway (leading to re-entrancy even in single-threaded programs), all it's doing is not allowing these other events to be an exception to that. And we're calling it at the very end of our function, where control is about to be returned to the mainloop anyway, which would be doing the same thing. So I think this is an OK solution.
One possibility is that the events that are processed may themselves post further events, such that we may need to call sendPostedEvents()
multiple times to clear the whole backlog. But a) there doesn't seem to be any way to check if there are pending events or to say "run until there are no more", and b) we're still calling it once per frame, so nothing should build up for more than one frame. I think we're good.
David if you could confirm this fixes the problem, maybe we can put this one to rest :)
Regarding imageAxisOrder='row-major'
, I don't think there is any significant cost to transposing the arrays since this is just a metadata change, it doesn't actually move the data in memory. So unless there is a "real" transpose operation happening at a lower level that imageAxisOrder='row-major'
somehow avoids (which seems doubtful), I don't think this will make any difference. Profiling results would change my mind though :).
Original comment by David Meyer (Bitbucket: dihm, GitHub: dihm).
PR #83 does it for me as well, and it does seem like the best solution. I’m more than willing to put this annoying issue to rest.
As for imageAxisOrder
, you are likely right, but I’ll check with profiling very quickly just to set my mind at ease.
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).
Fix issue #36: BLACS freezing in IMAQdxCamera continuous acquisition.
Merged ImageReceiver.update()
and ImageReceiver.handler()
and have the
whole method run in the Qt main thread.
Add zprocess version requirement for this functionality.
Call QApplication.sendPostedEvents() to prevent a backlog of paint events and other events that can otherwise build up at high frame rates and have been observed to freeze the GUI.
→ \<\<cset 6ea649736cd548c21a5a4075491926ae77c21aa6>>
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).
Merged in camera-hang-fix (pull request #83)
Fix issue #36: BLACS freezing in IMAQdxCamera continuous acquisition.
Approved-by: David Meyer
→ \<\<cset 0f2f62051db471d8c90a2a13b22abc51a5939991>>
Original report (archived issue) by David Meyer (Bitbucket: dihm, GitHub: dihm).
I again ran in to an issue I first saw working on PR #69 where fast continuous acquisitions with the PylonCamera would cause a complete lock up of the BLACS main process. With further testing and I can now reproduce the issue using the Mock Camera applying the following diff to the IMAQdxCamera blacs_workers.py
Basically, you need to get the fps way up to see the issue, which occurs around iteration #330 on my machine. The PylonCamera shows the same behavior, but it occurs at lower fps (25 fps) and sooner (~130). It does not appear to occur if the randomizer is turned off on the mock camera images (ie
return clean_image
only).Best guess at the moment is that the histogram updates of pyqtgraph are causing issues. Is it possible that those updates are non-blocking and can stack up somehow? That doesn’t seem likely but I’m not sure what else it could be at this point.