Closed ColRad closed 4 years ago
So I roughly PoC'd both approaches. But both seem to be slower than drawing 8k lines. Is there a way to upload a fixed set of GpuTriangles to the GPU (say 50 different discrete amplitude values) and then just arrange/transform them in the draw() call? Something like https://github.com/tiby312/egaku2d is doing but without dropping quicksilver?
Not really. If the bottleneck is actually the upload to the GPU, you're sorta out of luck. However, it doesn't seem like you've profiled what's taking the most time in your code. It's entirely possible that something surprising is slowing you down, and that performance wins can be had without storing all the triangles frame-to-frame.
Depending your platform, you may want to look into different profilers. I've previously used VerySleepy on Windows and Valgrind on Linux. Once you see the breakdown of where your program is spending time, we can look into ways of optimizing it.
Hey thanks a lot for the feedback! I benchmarked my program with flamegraph. Attached the output SVG.
Unfortunately the biggest performance consumer is drawing to screen with 64%.
<Line as Drawable>::draw()
takes 32% and window::flush()
takes 24%.
I'd be super grateful if you could have a quick look at it. Maybe there is something really obvious that I'm doing wrong here.
https://raw.githubusercontent.com/ColRad/stuff/master/flamegraph.svg
Another strange thing I realized: when reducing the lines to draw to 1/10th (8000 -> 800) the total draw time only reduces from 64% -> 30%.
Oh, something I forgot to ask before: are you compiling your code with the --release
flag / applying some level of optimization? That's often crucial for getting anything close to acceptable perf with Rust.
As for the flamegraph, this is very helpful, thanks! Window::flush()
is what I expected to take the most time, because that's where anything OpenGL-related happens. It seems like there's a surprising amount of overhead relating to drawing Line
s with 0.3; it might be a performance win to simply create 1-pixel-thick Rectangle
s instead. However, as you mentioned in your original post, directly modifying a Mesh is an option here. It seems like that might be a large performance help here.
Yeah, I'm using the --release
flag and I tried out all the tweaks from https://deterministic.space/high-performance-rust.html unfortunately they have no impact.
So I was able to reduce the load by approximately 10% by directly extending window.mesh()vertices
and window.mesh().triangles
. And save another 40% by down-sampling to 1/3rd of my datapoints.
Unfortunately this is not enough when running on a RPi4 (Legacy mode)
And when enabling FKMS quicksilver only shows a black screen (I guess the RPi's GLSL 1.5 is not supported).
So I'm gonna try out https://github.com/tiby312/egaku2d maybe this is more appropriate for my usecase. At least in a PoC I could draw 10k dynamic lines with 18% CPU load on the RPi4.
Thanks again for your feedback! Even though I probably can't use it for my current Project, I really like quicksilver!! I appreciate your work!
Hi, first of all thanks for this great lib! I really appreciate it! I'm still using v0.3 as I need Lyon to draw some SVGs. There is one problem I have, maybe you have a suggestion on how to make it perform better. On one screen I draw audio data. Currently I use 1 vertical line for every pixel. The length of the line is basically the amplitude at a certain time. As I draw multiple tracks I'm drawing over 8000 Lines on the screen. I cant create a fixed asset, as the audio data is dynamic (recording / copy/paste data from/to other tracks). Drawing this screen currently takes the most of the processing power. Can you suggest how to optimize this? I thought of keeping the audio data in a Mesh and directly modifying the mesh when changes occur. Then simply drawing the Mesh in ever draw() call. Would this be faster than drawing 8k Lines? What about creating the tracks as raw image data and then creating and drawing an Image on every draw() call?
Again thanks for your hard work!!