tidbyt / pixlet

Build apps for pixel-based displays ✨
https://tidbyt.com
Apache License 2.0
756 stars 107 forks source link

Improve render performance #350

Open FD- opened 2 years ago

FD- commented 2 years ago

I'm running pixlet on a Raspberry Pi Zero as a makeshift self-hosted server for private apps. Execution of pixlet render currently takes ages on this admittedly slow hardware, so I started to investigate a bit. As part of this investigation, I discovered 2 possible improvements for render performance.

Improvement 1 (speedup for applets using the Image widget)

I started my investigations on the reddit_r_place community app. As-is, pixlet on the Raspberry Pi Zero takes about 37 seconds to render the applet:

$ time ./pixlet render ../community/apps/redditrplace/reddit_r_place.star
real    0m37.799s
user    0m34.017s
sys 0m2.081s

I used runtime/pprof for obtaining detailed runtime statistics. Immediately, it was apparent that something was strange with all these image.(*NRGBA).Opaque calls. In total, the CPU spent 20 seconds running that function!

profile1

Following the pixlet source code, I could see that the relevant portion in pixlet was in transformation.go, where the image specified by the starlark applet's Image element is passed to the gg library's DrawImage function.

https://github.com/tidbyt/pixlet/blob/ecae45a3e2a4e2c84ec3aeefa3cefa7b9623f8f2/render/animation/transformation.go#L238

This function in turn calls DrawImageAnchored, which uses golang.org/x/image/draw for the actual drawing:

image

Note that draw.Over is passed as the Op, which pixlet doesn't even need here. draw.Src would suffice, since we are drawing into an empty image anyway. This now gets interesting when we have a look at the implementation of the called function inside golang.org/x/image/draw:

image

We've found the source of these expensive image.(*NRGBA).Opaque calls, and they are not even needed by pixlet. If we had a way to specify the Op, we would actually prefer draw.Src here. For testing purposes, I changed go's local copy of github.com/fogleman/gg to always use draw.Src, and ran the initial test again:

time ./pixlet render ../community/apps/redditrplace/reddit_r_place.star 

real    0m12.525s
user    0m10.279s
sys 0m1.676s

Quite some improvement, I would say! Now, the problem is: I don't know how this performance issue could sensibly be fixed. The cleanest way would probably be having the github.com/fogleman/gg library export a function that allows specifying the Op parameter, but it seems like that library has been abandoned (no new commits for a year).

Improvement 2 (speedup for deeply nested applet UIs)

Currently, the Paint method of every widget creates a new image, which it returns to the caller, who will then just render that image into another image, ... until the root draws it into an image that has a background. I haven't thought that through (or implemented it), but shouldn't it be possible to pass the render context to every Paint method and have all draw operations on the same context? Given the bad performance of the gg.context.DrawImage function, IMHO it should be called as rarely as possible.

With improvement 1 from above implemented (the ugly way described above), the two gg.context.DrawImage calls per frame are clearly visible (1.3 seconds), and this is for an applet that is not nested at all.

image

I don't know whether performance is something you are concerned about for this implementation, but I still wanted to leave my notes here for others to find as well.

Other than that: I have to say the applet development experience for the Tidbyt platform is great (even if deployment of private applets is a pain ATM)!

betterengineering commented 2 years ago

Wow, thanks so much for the perf analysis @FD- ! We run the same Pixlet runtime in our backend to render all apps for Tidbyt, so we certainly care about performance 😅. To date, we've thrown CPU at the problem but we're quickly running up against the limits of that approach. For number one, let me fork the repo to the Tidbyt organization and we can make the changes there. For number two I'd need to spend a bit more time poking around but it certainly sounds logical.

betterengineering commented 2 years ago

I forked the repo here: https://github.com/tidbyt/gg. More then happy to review a PR or pitch in with your suggested changes.

FD- commented 2 years ago

Wow, I had no idea you were running this code on your backend! IMHO, your current implementation could be optimised quite a bit (easily 2-3x performance across applets) with only a few days of effort.

I might have a look at implementing improvement 1. I also thought a little more about improvement 2, and here are a few points:

  1. The widget interface needs changes along these lines:

    // PaintBounds Returns the bounds of the area that will actually be drawn to when Paint() is called
    PaintBounds(bounds image.Rectangle, frameIdx int) image.Rectangle
    Paint(dc *gg.Context, bounds image.Rectangle, frameIdx int)

    PaintBounds is needed so we can calculate the dimensions of a widget before drawing it. In fact, the current implementation does a full render just to estimate sizes, which is very inefficient AFAICT.

  2. We can push() the context before calling a child's Paint() function and pop() afterwards.

  3. Every Paint() implementation should be responsible for centering its drawing area within the bounds passed as argument. Currently, this is done by the respective parent.

  4. Is clipping really needed when painting children? Surely adds some cost as well.

FD- commented 2 years ago

It's worth noting that improvement 2 will render improvement 1 mostly ineffective (because we're no longer drawing the starlark image into an empty context). Still, improvement 2 should offer a speedup for all applets, while improvement 1 only helps for applets that draw large images.

FD- commented 2 years ago

Alright, I implemented improvement 1 in https://github.com/tidbyt/gg/pull/1.

rohansingh commented 2 years ago

Looking at profiles from production, I know for a fact that improvement 2 would be pretty huge.

FD- commented 2 years ago

I have an implementation for improvement 2. Huge speedup indeed. Will iron out the last bugs and open a PR later today or tomorrow.

FD- commented 2 years ago

Alright, improvement 2 done and tested. See PR #355. I’m seeing 2-3x speedup on average.

Maybe send me a Tidbyt when your server savings exceed the unit’s price :D