Feature: Parallel rendering for multiple display panes

seantrons commented 7 months ago

The performance in the current version is limited by the serialized nature of the implementation. To realize this issue, consider 10 identical animated image panes where each requires 20ms to compute the next frame. Although it should take around 20 ms to compute all of them, the current definition for StatefulWidget updates a list of states serially when a control widget is modified meaning it will take over 200ms.

Using a list for this sort of thing is fine, but it would be best if the updates were done in parallel. Perhaps we can implement a sort of dispatcher which sends commands to child processes to compute frames in parallel. It may also be worth it to involve a queue which can be configured to:

process all frame requests
on completion of current processing, process only the most recent frame request and discard all others (somewhat inspired by the UDP protocol).

seantrons commented 7 months ago

After reviewing material on the subject as it relates to both Python and PyQT, this is absolutely doable but it will require several important design decisions. For instance, would it be better to write our own suite of MP tools in house using python multiprocessing (since multithreading is locked behind GIL)? Or would it be better to use Qt's builtin's where we could define custom workers and utilize the existing thread pool?

Regarding the latter option, I have yet to determine if this would cause incompatibilities between Qt versions. Such would be an issue as I'd like this package to become more agnostic of the Qt version used since PyQtGraph's impl. allows for this.

seantrons commented 7 months ago

Subset of pages reviewed thus far:

seantronsen commented 7 months ago

Also need to find some way to benchmark the prior vs. post implementation to compare performance. Probably will just make a Frankenstein's monster out of the hypothetical GUI given in the first comment and compare performance timing data.

Could be a good idea to implement #4 first and also add some sort of performance logging mechanism to both the animation class and the viewer class. Filtering can be used at that point to sort out the details as it would generate a ton of logging information.

seantrons commented 7 months ago

This issue remains a high priority, but I'm taking my time to ensure that a good implementation is created. So far I've been browsing through some of the Python language literature on the subject to see what features and detriments are provided out of the box.

My biggest pet peeve so far is that there doesn't seem to be a way to get around requiring calling code to have an `if name == "main" block. Thus far I haven't found a good way to circumvent it in a pure Python sense. Not sure if there is a way, but I'll keep looking on occasion.

Future readers: Before you go off and say there's a reason for that, I understand very well this construct effectively functions as a guard against a fork bomb on some systems. Still, other languages that I use like Rust do not require this. It's just another requirement that can be easy to forget.

In addition to the regular old multiprocessing library for Python level parallelism, I'm also looking into the following parallel libraries:

My goal is to come to a decision on this within the week and start on the implementation. I'm thinking it may also be a good idea to look into coming up with a solution to #2 to make it easier to determine the possible speed ups and more importantly the potentially speed losses which may occur from overhead. I'm also considering a shared memory implementation as this seems to be made available by some of the libraries, but there are some caveats to this as well:

standard data corruption and shared memory errors
creating an implementation for dynamically sized buffers. the criteria for resizing is easy, but what would be the criteria for downscaling (e.g. large image => small images => ... => small images => ... => large image, at what point in the process do we size down and allow for the future overhead of allocating more memory?).?

seantrons commented 6 months ago

Some more information on the topic:

My plan is to start basic and soley employ the python multiprocessing library. If that goes well and performance scales acceptably, then I'll leave it in until a decision is made in the future. If not, I'm also planning to try out some of the features from Dask and joblib, seeing where it goes and so on. Regardless, I haven't seen much of anything that doesn't require the user to employ the "main" name guard. Still fighting against that I guess. It might be possible using Python's or Qt's version of threads if communication is established with a tailored sub-process.

seantrons commented 6 months ago

This may also be worth looking into. The documentation isn't great, but there still is source code to browse through that's fairly easy to read and a demo in the examples runner.

https://pyqtgraph.readthedocs.io/en/latest/api_reference/widgets/remotegraphicsview.html

seantronsen commented 6 months ago

The more I read into this subject, the more I realize that we should be taking greater advantage of Qt's signals and slots design specification.

An article for future reference. https://doc.qt.io/qtforpython-6/PySide6/QtCore/Signal.html

seantronsen commented 6 months ago

The docstring comment on this page discusses some Qt behavior with parallelism. Need to find a good reference besides source code that discusses these things. Allegedly, Qt will crash if the GC runs in multiple threads instead of just the GUI thread.

https://github.com/pyqtgraph/pyqtgraph/blob/master/pyqtgraph/util/garbage_collector.py

Update:

I've found some references on this topic. The Qt docs I've been reading seem to be somewhat all over the place. Not to mention, there are subtle nuances between PyQt6 and PySide6. So far, I've found browsing the source code and reading the riverbank computing version of the documentation to be most helpful.

source code and no there are no typos or mistakes in that link. The repository is named qt5.git despite the most current version being 6.7.*.
riverbank docs - these appear to be the publishers of PyQt6. I found them by following the links from the PyPI page for the PyQt package. These docs mention some of the quirks with parallelism and the GC.

Another project that may be of interest is https://www.maturin.rs/ which would allow for offloading of some features to a Rust code layer. Of particular interest would be to accelerate some of the drawing functions, but process management could fit here as well. Just something else to think about.

seantronsen / pvt

Feature: Parallel rendering for multiple display panes #14