labscript-suite-oldfinal1 / labscript_utils

Shared modules used by the labscript suite. Includes a graphical exception handler, debug tools, configuration management, cross platform filepath conversions, unit conversions and custom GUI widgets.
Other
0 stars 0 forks source link

ToolPalette._layout_widgets() appears to recurse, causing a stack overflow #27

Open philipstarkey opened 5 years ago

philipstarkey commented 5 years ago

Original report (archived issue) by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


A lab here at NIST got a "Python has stopped working" error upon starting BLACS today. Starting Python as python -X faulthander -m blacs revealed it to be a stack overflow in ToolPalette._layout_widgets(), line 348:

self.setMinimumSize(QSize(self.minimumSize().width(), total_height))

Adding a printline to inspect the dimensions of the QSize() object revealed that _layout_widgets() was being called a large number of times prior to the crash, with the dimensions alternating back and forth between two values.

I added a counter to print the recursion depth and confirm that the method is recursing, but I made a syntax error and BLACS started successfully (having caught the error), modifying its save file and widget geometries such that the stack overflow no longer occurred. So unfortunately I lost the ability to reproduce the problem, as it is sensitive to the widget geometries.

Just documenting what I found here. ToolPalette._layout_widgets() seems to be recursing and not converging on a fixed layout geometry that would break the cycle. If I see it again I will backup BLACS save file and connection table file to create a reliable reproducer of the problem.

Others have reported similar crashes before, but they have not been stack overflows, they have been segfaults that looked to be bugs in Qt. This one actually is plausibly our fault which means we may be able to fix it.

philipstarkey commented 5 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


philipstarkey commented 5 years ago

Original comment by Philip Starkey (Bitbucket: philipstarkey, GitHub: philipstarkey).


This is a bit of a shot in the dark, but I suspect that there is a bug when calculating how many items fit in a row. This is likely due to incorrectly accounting for padding/borders/similar. I think such a error could lead to the behaviour you saw (but it's a a while since I touched that code)

Plausibly this issue could cause segfaults on some versions of Qt as Qt often doesn’t like doing things really fast (some of which may be fixed in later versions of Qt)

philipstarkey commented 5 years ago

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


I suspect you're right, that the calculation could be a little bit off. But it would be good if the code wasn't sensitive to relatively minor errors or to changing padding values, so fixing the calculation itself might not be the best solution, unless it's totally bomb-proof and looks up all the padding values at run-time with no hard-coded values.

There are also things that can happen where a widget says it is one size, and on that basis a container widget decides it should show a scrollbar or something, then the child widget is updated with the new size of the container (minus the scrollbar width), it declares itself to be smaller, then the parent widget decides it doesn't need a scrollbar after all, repeat. In this case something needs to break the cycle. We might be able to add a "with signals disconnected, update size" or similar to just stop recursion and keep the most recently calculated size. If that size if "good enough" then we'll be happy, even if there is some hysteresis in the number of buttons per row whilst resizing tabs.