kivy / kivy

Open source UI framework written in Python, running on Windows, Linux, macOS, Android and iOS
https://kivy.org
MIT License
17k stars 3.04k forks source link

Graphics: Use indexed `GL_TRIANGLES` instead of `GL_TRIANGLE_FAN` to draw `Ellipse` #8662

Closed misl6 closed 1 month ago

misl6 commented 1 month ago

Maintainer merge checklist

After adding ANGLE support for iOS, I noticed a small performance boost on some sides, but I was also starting to be affected by a huge performance drop when drawing complex UIs (so I noticed it only after the actual merge).

It wasn't clear initially what the actual issue was, as in a complex UI a lot of things happen, and none of the simple reproducible examples I created showed such a huge performance drop.

The nice thing is that the whole process made me aware of the potential improvements we can make in the future (See: #8664 )

But, finally, we found the root cause for this performance drop, and it's related to usage of GL_TRIANGLE_FAN. GL_TRIANGLE_FAN primitive is used by Kivy graphics by Ellipse and RoundedRectangle objects.

Unfortunately, when it comes to Metal or DirectX>=10, triangle fans are not natively supported, and in the case of ANGLE this missing primitive is emulated and therefore is incredibly slow (at least on iOS) (See: https://bugs.webkit.org/show_bug.cgi?id=237533)

This PR switches from GL_TRIANGLE_FAN to indexed GL_TRIANGLES for drawing ellipses. The changes are kept intentionally minimal to follow an incremental path, but the whole code can be improved to increase efficiency. (Again, as an example see: #8664 )

An additional PR will take care of RoundedRectangle, even if here the performance drop seems to be less visibile.

Indexed GL_TRIANGLES are on an OpenGL side, as much as fast as a GL_TRIANGLE_FAN, but the need of allocating (improvable, as we need to allocate it only when segments or angle_start / angle_end are changed) more memory for the index, can make it slightly slower on certain platforms.

The following example has been used to stress the Ellipse drawing:

from kivy.app import App
from kivy.uix.widget import Widget
from kivy.graphics import Ellipse
from kivy.clock import Clock

class EllipseDemo(Widget):
    def __init__(self, **kwargs):
        super(EllipseDemo, self).__init__(**kwargs)
        self.ellipses = []
        self.scroll_y = 0
        self.direction = 1

        # Create x ellipses
        for i in range(800):
            # Position the ellipses on a 10x10 grid
            ellipse = Ellipse(size=(50, 50), angle_start=0, angle_end=360)
            self.ellipses.append(ellipse)
            self.canvas.add(ellipse)

        # Update the scroll every 1/120s
        Clock.schedule_interval(self.update_scroll, 1 / 120)

    def update_scroll(self, dt):
        if self.scroll_y >= self.height:
            self.direction = -20
        if self.scroll_y <= 0:
            self.direction = 20

        self.scroll_y += self.direction

        # Update the segments for all ellipses
        for i in range(800):
            pos = (self.width / 20) * (i % 20), (self.height / 20) * (
                i // 20
            ) + self.scroll_y
            self.ellipses[i].pos = pos

        print(f"FPS: {Clock.get_fps()}")

class EllipseDemoApp(App):
    def build(self):
        return EllipseDemo()

if __name__ == "__main__":
    EllipseDemoApp().run()
Test results: - GL_TRIANGLE_FAN GL_TRIANGLES (indexed) Other
iOS (ANGLE) ~10fps ~60fps iPhone 14 Pro, iOS 17.4.1
macOS (ANGLE) ~81fps ~94fps Macbook M1 Pro, macOS 14.4.1
Android (OpenGLES) ~34fps ~37fps Samsung Galaxy s10e, Android 12
Ubuntu (OpenGL) ~44fps ~38fps Intel i7-7700HQ + HD Graphics 630, Ubuntu 22.04
Windows (OpenGL) ~30fps ~33fps Intel i7-7700HQ + HD Graphics 630, Windows 11

As we can see, the change is not only beneficial to platforms backed by ANGLE, but also on Android and Windows which still rely on OpenGLES / OpenGL.

On Ubuntu, at least on my configuration, GL_TRIANGLE_FAN is slightly faster, but I'm quite sure that with additional optimizations (See above), we can reach even better fps.

Some screenshots and videos:

iOS before: Image 29-03-24 at 19 50

https://github.com/kivy/kivy/assets/8177736/175ea4f7-b0ec-4fd6-814f-31ad77ba726b

iOS after:: Image 29-03-24 at 18 17

https://github.com/kivy/kivy/assets/8177736/29306cb6-c146-4f83-ab02-b7188278eb73

DexerBR commented 1 month ago

Consider n → triangles count, for 360 triangles:

It is interesting that even though it requires more vertices to compute, you have found a way to optimize the code, by using indexed GL_TRIANGLES.

Do you think that the extra amount of memory allocated could be causing a small drop in performance on some platforms, or could it have to do with something else?

misl6 commented 1 month ago

Consider n → triangles count, for 360 triangles:

  • GL_TRIANGLE_FAN = n + 2 = 360 + 2 = 362 vertices
  • GL_TRIANGLES = 3 n = 3 360 = 1080 vertices

It is interesting that even though it requires more vertices to compute, you have found a way to optimize the code, by using indexed GL_TRIANGLES.

Vertices are re-used to compute triangles with indexed triangles, so:

Only the indices for GL_TRIANGLES are 1080.

Do you think that the extra amount of memory allocated could be causing a small drop in performance on some platforms, or could it have to do with something else?

We can't be sure, and should be measured, but certainly allocating more memory does not help. (Will move this discussion in #8664, as we may try to make some performance tweaks while fixing the issue)

misl6 commented 1 month ago

Merging, as should not break anything, but let's keep performance monitored (after #8664 changes )