Closed ksofiyuk closed 1 year ago
I found a similar issue that someone had on Stackoverflow 6 years ago: PyQt5 OpenGL swapBuffers very slow
Cool find! I must say I have no idea what autoswap and swapbuffers does, and therefore what the consequences are of turning it off. The description in the vispy canvas doctring doesn't help much... @perlman @brisvag @alisterburt... Any insights here?
@jni I've tested it more and it looks like autoswap
slows down everything in napari. When you drag the canvas using the pan the FPS rises from 15 to 60 as well after setting autoswap
to False
.
And it looks redundant because I haven't seen any rendering problems with autoswap=False
.
Sure, I'm definitely on board with it not being important, but I'd like to know why it defaults to True if that's the case.
What machine do you use? I rarely see 15fps other than in labels painting... So I wonder if there is some OS-specific stuff going on here...
What machine do you use?
I'm using the machine with the following specs:
Processor: AMD Ryzen 5 4600H with Radeon Graphics, 3000 Mhz, 6 Core(s), 12 Logical Processor(s)
Installed RAM: 16.0 GB (15.4 GB usable)
It is actually not a slow machine at all. That's why I was so confused by such low FPS in napari.
I rarely see 15fps other than in labels painting...
Did you measure it? I didn't also think that actually dragging is so slow until I just measured it and saw 15 FPS. For me, not very high FPS in dragging is simply less noticeable and disturbing.
@jni I have another much more high-end hardware with GeForce RTX 3080, and while testing with it I haven't noticed at least clearly visible low FPS (but I didn't measure the FPS on it and how disabling autoswap
exactly affects the FPS). But I would be very surprised if it was laggy on such a machine.
doing the viewer_fps_label.py
example with a 5000x5000 np.random.random, I see 30-90fps on my Macbook Pro... It feels very smooth...
I wonder if it has something to do with GPU memory transfer rather than using shared memory like in the Macbooks...?
Ultimately, I'm not against removing it, but I'd love to understand what it does before we do it. Otherwise it feels like removing a nut from a car without understanding what it's holding... š
I wonder if it has something to do with GPU memory transfer rather than using shared memory like in the Macbooks...?
It doesn't look to be a valid hypothesis as my laptop with Ryzen 5 4600H has integrated graphics, which shares the same RAM with the CPU.
ah, ok, I thought "radeon graphics" meant discrete GPU. Well... š¤·!
Very cool find @ksofiyuk and thanks for the ping @jni
From some Googling it seems "double buffering" is used to prevent flickering and improve the smoothness
An OpenGL canvas maintains two buffers for drawing:
The front buffer is the buffer that is currently visible on the screen. It contains the latest completed frame. The back buffer is the buffer that is not visible on the screen. It is where the application draws the next frame while the front buffer is being displayed.
"swap buffers" swaps the roles of the front and back buffers. The back buffer, which contains the newly drawn frame, becomes the front buffer and is displayed on the screen, while the previous front buffer becomes the back buffer and is now used for drawing the next frame.
It's possible that turning this off will lead to some 'tearing' where partially rendered frames are visible sometimes - I don't have any experience with this so I vote we run the experiment, get a few people on different systems to test and see what happens!
There are some hardware calls which will be pretty dependent on the windowing system/openGL installation... "Swapping buffers is typically performed by calling a function such as SwapBuffers (in Windows) or glXSwapBuffers (in Linux), depending on the platform and the windowing system being used."
I suspect this will be some particular hardware/software interaction causing a slowdown - I also see >60fps most of the time but still think we should push ahead and experiment with turning autoswap off if the potential gains are so significant
doing the
viewer_fps_label.py
example with a 5000x5000 np.random.random, I see 30-90fps on my Macbook Pro... It feels very smooth...
@jni Btw, examples/viewer_fps_label.py
doesn't seem to work for me using the current main. The callback update_fps
is never called...
@ksofiyuk just tested on my machine and working fine, platform specific bug potentially?
napari: 0.5.0a2.dev129+ge8d92f15
Platform: macOS-13.1-arm64-arm-64bit
System: MacOS 13.1
Python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:41:22) [Clang 13.0.1 ]
Qt: 5.15.4
PyQt5: 5.15.7
NumPy: 1.23.2
SciPy: 1.9.0
Dask: 2022.8.1
VisPy: 0.12.1
magicgui: 0.5.1
superqt: unknown
in-n-out: 0.1.5
app-model: 0.1.1
npe2: 0.6.1
OpenGL:
- GL version: 2.1 Metal - 83
- MAX_TEXTURE_SIZE: 16384
Screens:
- screen 1: resolution 1512x982, scale 2.0
Settings path:
- /Users/alisterburt/Library/Application Support/napari/napari_fbb2073d056ea8768c19c829de069e5465be0c67/settings.yaml
Plugins:
- napari: 0.5.0a2.dev129+ge8d92f15 (77 contributions)
- napari-console: 0.0.6 (0 contributions)
- napari-mrcfile-reader: 0.1.3 (2 contributions)
- napari-svg: 0.1.6 (2 contributions)
just tested on my machine and working fine, platform specific bug potentially?
@alisterburt Is your replay related to examples/viewer_fps_label.py
?
@ksofiyuk yep - just asserting that the callback is indeed called and the example runs as expected on my machine on main
@ksofiyuk yep - just asserting that the callback is indeed called and the example runs as expected on my machine on main
Ok, then it looks like it is indeed a platform specific bug.
@jni @alisterburt Sorry, for false reporting, it was my fault, I just accidentally broke my local vispy
while debugging the subject of this topic. Now, it's working for me.
P.S. It is only regarding examples/viewer_fps_label.py
@ksofiyuk Probably not directly related to this, but I note you have PyQt5: 5.12.3
Is there a specific reason you're using this version, which is quite old (it's napari min requirement)? I ask because it's increasingly tricky to support pre 5.13 Qt while trying to improve Qt6 support.
@jni with autoswap=True
even with the default small 5x5x5
volume it shows 16 FPS š
autoswap = True
autoswap = False
@ksofiyuk Probably not directly related to this, but I note you have
PyQt5: 5.12.3
Is there a specific reason you're using this version, which is quite old (it's napari min requirement)? I ask because it's increasingly tricky to support pre 5.13 Qt while trying to improve Qt6 support.
@psobolewskiPhD I've just updated it all to Qt: 5.15.2 PyQt5: 5.15.9
. The performance issue with autoswap=True
remains exactly the same.
There is no any specific reason for PyQt5: 5.12.3
. I just haven't updated it for a long time š .
thanks for reporting back @ksofiyuk !
I ran the same test as you @ksofiyuk and enabling autoswap
makes no discernible difference in FPS on my system.
It does not feel like the double buffering makes a big difference during 2D/3D rendering on my system, there is a slight difference when rotating the 3D camera extremely quickly but nothing perceptible when doing things 'reasonably'. In this context, I think it would be safe to default to autoswap=False
on the canvas to solve this problem
Maybe it is a bug in @ksofiyuk Windows manager. Like in crashing the Windows manager on a big cursor?
As @alisterburt described, the swap buffer is to avoid flickering. It is a common practice used in programming GUI applications (even many games using it).
So if there will be a plan to fix it, then it should be done by env variable or settings. Not globally.
Maybe it is a bug in @ksofiyuk Windows manager. Like in crashing the Windows manager on a big cursor?
@Czaki On the contrary, I don't have any problems with a big cursor on this Windows laptop. I have this issue with another high-end computer with RTX 3080 on which Xubuntu is installed (and where I don't have this issue with autoswap
). š
As @alisterburt described, the swap buffer is to avoid flickering. It is a common practice used in programming GUI applications (even many games using it).
@Czaki @alisterburt I'm pretty sure that swapBuffers
is just not used in a proper way in VisPy causing performance issues on some machines.
It is just like here PyQt5 OpenGL swapBuffers very slow, where the issue was caused by inappropriate use of swapBuffers
.
I could blame my system if all OpenGL applications were laggy, but I don't have problems with other 3D applications, even with other PyQt apps that use OpenGL.
which Xubuntu
I remember multiple problems on Xubuntu, and I finally switched to Gnome Shell. As I read the team who manages XFCE is very limited and they do not have enough resources to fix all problems.
It is just like here PyQt5 OpenGL swapBuffers very slow, where the issue was caused by inappropriate use of
swapBuffers
.
Based on the accepted response to linked StackOverflow. Could you check if #5710 solves your problem?
Could you check if https://github.com/napari/napari/pull/5710 solves your problem?
@Czaki I've checked it a few days ago. Unfortunately, it didn't help at all. As I described in the first message, I did a lot of profiling of napari to figure out the reason of slow down. It turned out that swapBuffers
blocks the OpenGL for 50-60
ms every call, no matter what the size of an image. It has nothing to do with napari.
I'm not an expert on OpenGL, so it is hard for me to say what exactly causes this slow down, but I can imagine that the way the tasks are sent to OpenGL or how often you call swapBuffers
can affect the overall performance.
this is really interesting, I'm learning a lot reading around this - it would be interesting to see whether throttling only the swapBuffers
calls improves performance for you
If you want to play with this @ksofiyuk I think you could throttle the callback in vispy directly with https://psygnal.readthedocs.io/en/stable/API/throttler/
@Czaki @alisterburt One possible explanation that I can think of - the way how swapBuffers
acts (in synchronous or asynchronous mode) clearly depends on your system, even maybe on the driver of your video card. In my case, it is clearly executed in a blocking way waiting for all OpenGL commands to finish before swaping the buffers, which causes the slow down.
Updating something like 5x5
image is definitely cannot take anything near 50 ms even on 20 years old machines, so it means that VisPy on each draw update is likely to send some other much more heavy tasks irrelevant to the image update itself.
On the systems, where swapBuffers
is executed in an asynchronous way, you just cannot notice this irrelevant background tasks, because OpenGL executes in asynchronous by design, it just makes your GPU a little bit more busy.
In order to proper solve this issue, we need to understand what tasks OpenGL executes exactly at each draw update. I think just disabling swapBuffers
or even throttling it is like sweeping problems under the carpet, because there are definitely some other issues that are really the cause of the problem. Very long execution of swapBuffers
is just a symptom.
@ksofiyuk As the problem is on windows. Did you have propertiary driver, and if yes in which version?
As the problem is on windows. Did you have propertiary driver, and if yes in which version?
@Czaki I have the Ryzen 5 4600H CPU with Radeon RX Vega 6 integrated graphics. I was using the driver that Windows installs and updates automatically. It turned out that it was very old (version 22 from 2020), I updated it to the latest version - 31.0.14043.7000 (26-Mar-23)
.
However, unfortunately it didn't resolve the issue. But it gave me some boost. Now I have 18-20 FPS with autoswap=True
instead of previous 15-16 (+5 FPS boost!), and when I use add_labels.py
example with autoswap=False
the FPS increased from 50 to 60. The idea of updating driver was good and gave me some performance boost, but it is not a solution.
looking on this wiki https://www.khronos.org/opengl/wiki/Swap_Interval it looks like selection of synchronized and non synchronized swap buffer should be possible.
I see few references to glFinish in vispy. Maybe this could be a good way to research?
@Czaki @alisterburt The issue looks to be caused by the fact that VisPy does a lot of stuff in a single thread. The correct way of dealing with OpenGL is to keep it in a separate thread from other data processing since it can be blocked in an unintuitive way known only to the GPU driver - for example, it may block the current thread thinking that it does nothing and the resources should be allocated to another application.
I've just done a profiling of OpenGL calls, and and it takes less than 1 ms to process all the VisPy OpenGL tasks for a single frame, the driver thinks that this application is not heavy-loaded and just gives it very low priority allocating time to another processes.
I've found this answer on StackOverflow useful: Avoid waiting on SwapBuffers
When I put SwapBuffers in a separate thread in VisPy, the issue is completely gone - SwapBuffers are executed correctly (judging by a trace log of OpenGL calls) and VisPy and napari are not blocked anymore.
And which mechanism do you use to synchronize threads?
And which mechanism do you use to synchronize threads?
@Czaki I synchronized them using queue.Queue
, the main thread just puts tasks to another thread, a thread is blocked until it gets a new task - very simple stuff. I did it in an ugly and hard-coded way just to test the hypothesis. š
Maybe threading.Lock
could be better mechanism for this.
Should this fix happen in napari or in vispy? If in vispy then call @brisvag for review as he is napari core-dev and has permissions in vispy.
Did this fix should happen in napari or in vispy?
I think it has almost nothing to do with napari as all the OpenGL stuff is happening inside VisPy. VisPy should just have another thread for OpenGL stuff.
The simplest solution is to put only SwapBuffers
in a separate thread, it works at least for me and completely resolves the issue, using something like this (this code is just for illustrative purposes):
# def __init__
self._swap_buffers_worker = threading.Thread(target=self.swap_buffers_thread, daemon=True)
self._swap_buffers_queue = queue.Queue()
self._swap_buffers_worker.start()
# ...
def swap_buffers(self, event=None):
self._swap_buffers_queue.put(1)
def swap_buffers_thread(self):
while True:
self._swap_buffers_queue.get()
self._backend._vispy_swap_buffers()
instead of just
def swap_buffers(self, event=None):
self._backend._vispy_swap_buffers()
I think it should be implemented somewhere there: https://github.com/vispy/vispy/blob/5449e3402b35da4d6a8215d8d72e19dcf05917d7/vispy/app/canvas.py#L415
Maybe
threading.Lock
could be better mechanism for this.
@Czaki Agreed, Lock
seems more suitable here:
def swap_buffers(self, event=None):
self._swap_buffers_lock.release()
def swap_buffers_thread(self):
while self._swap_buffers_lock.acquire():
self._backend._vispy_swap_buffers()
I wrote the previous comment from memory.
I just checked the documentation. And there is such information:
When invoked on an unlocked lock, a RuntimeError is raised.
So it should be something like
def swap_buffers(self, event=None):
if self._swap_buffers_lock.locked()
self._swap_buffers_lock.release()
Or go back to queue.
But I start thinking if we should not use barrier https://docs.python.org/3/library/threading.html#threading.Barrier
The general thing is that swap buffer is used to swap rendered buffer and draw buffer.
So if we call swap buffer in async way it may happen that the "signal" to swap buffer was sent. Then new draw phase starts. And during this draw phase the swap will be started.
If we enforce synchronization using barriers it should reduce the probability of such an event as the main thread will wait until the sync thread is ready.
So if we call swap buffer in async way it may happen that the "signal" to swap buffer was sent. Then new draw phase starts. And during this draw phase the swap will be started.
As far as I understand, that cannot happen in OpenGL. OpenGL uses its own scheduler, when you call its methods, you just put tasks in its internal queue:
OpenGL Rendering Commands are assumed to execute be asynchronous. If you call any of the glDraw* functions to initiate rendering, it is not at all guaranteed that the rendering has finished by the time the call returns. Indeed, it is perfectly legitimate for rendering to not even have started when this function returns. The OpenGL specification allows implementations the freedom to get around to rendering commands whenever it is best for them.
So, OpenGL knows the correct order of its tasks, no matter when you put them unless of course you have separate threads that are fighting for the same OpenGL context, which is not our case.
Or go back to queue.
@Czaki It seems that only using a queue is correct. With lock you can miss one or more 'SwapBuffers' if the main thread tries to send the second command while the swap thread is still blocked by the first one.
Also, semaphores allows to not miss any event. But maybe the queue will be more precise.
Looks like you went already deep down the rabbithole :) I think this should be brought up over at Vispy for sure. @ksofiyuk since you have a good picture of the problem, do you mind opening an issue at vispy (linking here as well)? Otherwise let me know and I can take care of it.
In the meantime, even heavy stuff with fast camera movements looks exactly the same for me with autoswap=False
, so I'm in favor of doing that in napari until we figure it out on the vispy side.
š Bug
I was really puzzled why the brush was not so responsive as it should have even on low-resolution images. I measured FPS and discovered that it never exceeded 10-15 when you use the brush even on very small images (e.g.
examples\add_labels.py
).Initially, I thought that it was because of some very inefficient code in the Label layer, but after long hours with a debugger and profiler I eliminated this hypothesis. There is certainly room for improvement, but there is no such code that can limit FPS to 15 even on 100x100 images.
Finally, I found the culprit of this issue, it was because of using
autoswap=True
in the VisPy Canvas. Settingautoswap
toFalse
in the following code immediately increases the FPS from 13 to 50 inexamples\add_labels.py
on my laptop: https://github.com/napari/napari/blob/cf55aa005230a89b906be2447ce900be2c1f200a/napari/_qt/qt_viewer.py#L431-L436Calling
Canvas.swap_buffers
at each update below blocks the OpenGL engine at least for 50-60 ms on my laptop not depending on the resolution of an image causing a significant drop in the rendering performance: https://github.com/vispy/vispy/blob/5449e3402b35da4d6a8215d8d72e19dcf05917d7/vispy/app/canvas.py#L236-L237 which leads to the execution of the following code: https://github.com/vispy/vispy/blob/5449e3402b35da4d6a8215d8d72e19dcf05917d7/vispy/app/backends/_qt.py#L869I didn't dig any deeper into VisPy as disabling
autoswap
solves the issue without any noticeable bugs so far for me.Environment
I tested it on a laptop with Windows and integrated graphics: