seantronsen / pvt

GNU General Public License v3.0
0 stars 0 forks source link

Feature: Parallel rendering for multiple display panes #14

Open seantrons opened 7 months ago

seantrons commented 7 months ago

The performance in the current version is limited by the serialized nature of the implementation. To realize this issue, consider 10 identical animated image panes where each requires 20ms to compute the next frame. Although it should take around 20 ms to compute all of them, the current definition for StatefulWidget updates a list of states serially when a control widget is modified meaning it will take over 200ms.

Using a list for this sort of thing is fine, but it would be best if the updates were done in parallel. Perhaps we can implement a sort of dispatcher which sends commands to child processes to compute frames in parallel. It may also be worth it to involve a queue which can be configured to:

seantrons commented 7 months ago

After reviewing material on the subject as it relates to both Python and PyQT, this is absolutely doable but it will require several important design decisions. For instance, would it be better to write our own suite of MP tools in house using python multiprocessing (since multithreading is locked behind GIL)? Or would it be better to use Qt's builtin's where we could define custom workers and utilize the existing thread pool?

Regarding the latter option, I have yet to determine if this would cause incompatibilities between Qt versions. Such would be an issue as I'd like this package to become more agnostic of the Qt version used since PyQtGraph's impl. allows for this.

seantrons commented 7 months ago

Subset of pages reviewed thus far:

seantronsen commented 7 months ago

Also need to find some way to benchmark the prior vs. post implementation to compare performance. Probably will just make a Frankenstein's monster out of the hypothetical GUI given in the first comment and compare performance timing data.

Could be a good idea to implement #4 first and also add some sort of performance logging mechanism to both the animation class and the viewer class. Filtering can be used at that point to sort out the details as it would generate a ton of logging information.

seantrons commented 7 months ago

This issue remains a high priority, but I'm taking my time to ensure that a good implementation is created. So far I've been browsing through some of the Python language literature on the subject to see what features and detriments are provided out of the box.

My biggest pet peeve so far is that there doesn't seem to be a way to get around requiring calling code to have an `if name == "main" block. Thus far I haven't found a good way to circumvent it in a pure Python sense. Not sure if there is a way, but I'll keep looking on occasion.

Future readers: Before you go off and say there's a reason for that, I understand very well this construct effectively functions as a guard against a fork bomb on some systems. Still, other languages that I use like Rust do not require this. It's just another requirement that can be easy to forget.

In addition to the regular old multiprocessing library for Python level parallelism, I'm also looking into the following parallel libraries:

My goal is to come to a decision on this within the week and start on the implementation. I'm thinking it may also be a good idea to look into coming up with a solution to #2 to make it easier to determine the possible speed ups and more importantly the potentially speed losses which may occur from overhead. I'm also considering a shared memory implementation as this seems to be made available by some of the libraries, but there are some caveats to this as well:

seantrons commented 6 months ago

Some more information on the topic:

My plan is to start basic and soley employ the python multiprocessing library. If that goes well and performance scales acceptably, then I'll leave it in until a decision is made in the future. If not, I'm also planning to try out some of the features from Dask and joblib, seeing where it goes and so on. Regardless, I haven't seen much of anything that doesn't require the user to employ the "main" name guard. Still fighting against that I guess. It might be possible using Python's or Qt's version of threads if communication is established with a tailored sub-process.

seantrons commented 6 months ago

This may also be worth looking into. The documentation isn't great, but there still is source code to browse through that's fairly easy to read and a demo in the examples runner.

https://pyqtgraph.readthedocs.io/en/latest/api_reference/widgets/remotegraphicsview.html

seantronsen commented 6 months ago

The more I read into this subject, the more I realize that we should be taking greater advantage of Qt's signals and slots design specification.

An article for future reference. https://doc.qt.io/qtforpython-6/PySide6/QtCore/Signal.html

seantronsen commented 6 months ago

The docstring comment on this page discusses some Qt behavior with parallelism. Need to find a good reference besides source code that discusses these things. Allegedly, Qt will crash if the GC runs in multiple threads instead of just the GUI thread.

https://github.com/pyqtgraph/pyqtgraph/blob/master/pyqtgraph/util/garbage_collector.py

Update:

I've found some references on this topic. The Qt docs I've been reading seem to be somewhat all over the place. Not to mention, there are subtle nuances between PyQt6 and PySide6. So far, I've found browsing the source code and reading the riverbank computing version of the documentation to be most helpful.

Another project that may be of interest is https://www.maturin.rs/ which would allow for offloading of some features to a Rust code layer. Of particular interest would be to accelerate some of the drawing functions, but process management could fit here as well. Just something else to think about.