Open ngortheone opened 3 years ago
Attached interactive flamegraph based on perf output.
Unfortunately Github disallows svg attachments, hence zip
Hey, thank you very much for your investigation. There is a wip render port of tiny-skia https://github.com/sandmor/orbtk/tree/tinyskia. Can you check if you have the same issues with it, please?
Unfortunately tinyskia port exhibits the same behavior. Debug build is as unusable as main branch, release build feels like 10% (subjective, I have no real measurements to back this up) faster than main branch, but still the lag is so bad that no user will ever tolerate it. If you find it useful - I can create similar perf
records and flamegraph for skia port.
Is there anything else I can do to help debugging this?
I can create similar perf records and flamegraph for skia port.
Yes sure thank you.
Is there anything else I can do to help debugging this?
Can you check this example? https://gitlab.redox-os.org/redox-os/orbclient/-/blob/master/examples/simple.rs
Maybe the problem is connected to our OrbClient sdl2 based window backend. It would be not the first time with have cpu issues with it on Linux.
https://gitlab.redox-os.org/redox-os/orbclient/-/blob/master/examples/simple.rs Looks to be OK. There is no CPU load, and I see a stream of events in the console in real time.
At position (0, 553) pixel color is : 0xFFB6B6B6
Key(KeyEvent { character: 'q', scancode: 16, pressed: true })
TextInput(TextInputEvent { character: 'q' })
Key(KeyEvent { character: 'q', scancode: 16, pressed: false })
Key(KeyEvent { character: '\u{0}', scancode: 71, pressed: true })
Key(KeyEvent { character: '\u{0}', scancode: 71, pressed: false })
Focus(FocusEvent { focused: false })
Focus(FocusEvent { focused: true })
ClipboardUpdate(ClipboardUpdateEvent)
Focus(FocusEvent { focused: false })
At position (10, 617) pixel color is : 0xFFBDBDBD
At position (14, 615) pixel color is : 0xFFBDBDBD
...
Although this does not seem to be an interactive application, just an image of some sort.
Interesting fact: calculator example runs much better. There must be something in code that makes thus bug appear. I will keep cutting showcase example to find out what it is.
I didn't get far by removing widgets, this doesn't seem to have any significant impact. But what helps is to set smaller size()
of the window. Calculator has much smaller window size and that hides the issue.
Window::new()
.title("OrbTk - showcase example")
.position((100, 100))
.size(400, 400) // <--- THAT
My observations
size()
- the better. With smaller window size periods of 100% CPU usage last shorter.So considering all of the above I want to make a few suggestions:
I hope this helps.
Minimal example to demonstrate the problem
use orbtk::prelude::*;
fn main() {
Application::new()
.window(|ctx| {
Window::new()
.title("OrbTk - showcase example")
.position((100, 100))
// .size(2000, 2000) // SLOW
.size(150, 50) // FAST
.child(ButtonView::new().build(ctx))
.build(ctx)
})
.run();
}
widget!(ButtonView {});
impl Template for ButtonView {
fn template(self, _id: Entity, ctx: &mut BuildContext) -> Self {
let slider = Slider::new().min(0.0).max(1.0).build(ctx);
self.child(Stack::new()
.spacing(8)
.child(slider)
.child(ProgressBar::new().val(slider).build(ctx))
.build(ctx)
)
}
}
Try running with both window sizes and feel the difference.
How to reproduce: left mouse click and hold on slider and make rapid mouse movements left and right many times. Observe CPU load, and how fast slider and progress bar follow the mouse.
In #392, the perf report says he second function that the cpu spent in is
First @ngortheone thank you very much for your contribution to find out what is the performance issue.
What I can see on the perf output is that the render
method on the orbclient based window is the most expensive on the output.
I checked the code and this little piece of code:
let color_data: Vec<orbclient::Color> = self
.render_context
.data()
.iter()
.map(|v| orbclient::Color { data: *v })
.collect();
takes 64ms on my machine. That's a lot. I use it to convert the u8 frame buffer of raqote to a Vec
I replaced this peace of code and now this part takes 0ms. @ngortheone can you check if it is now a little bit better on your machine, please?
@FloVanGH thanks!
Overall there is an improvement, but I don't think we are there yet.
What improved:
What didn't improve / other observations:
RUSTFLAGS="-Ctarget-cpu=skylake" cargo run --example showcase --release
cargo run --example showcase
I will do another perf record
soon with a flamegraph. Also I'll try to record a video from my screen to show better what I experience. Is there anything else I can do to help?
Ok thank you. There are some other parts that can causes this problem. Layout is not yet optimized, on each iteration every widget size and position will be recalculated (I'm currently work on an update). And I know that raqote is not the fastest render backend. A new backend like tiny-skia will help.
But I think there is more and we have to solve these pieces step by step, until debug build is usable on big sized windows.
Your perf record
will help, thank you. It was much easier to find the issue in the render method of the window backend.
Awesome. Some time during my day I will record perf data (likely closer to the end of my day, and it is morning here now)
@FloVanGH Attached perf data of today's build from develop branch. perf.data.zip
@FloVanGH have you had a chance to look into this?
@ngortheone unfortunately not. But I hope soon.
I've been tinkering on paper with a new algorithm for doing high-speed grid layouts by folding constants and combining gadget renders via inlining render code. The basic idea is that the "pivot points" of the x and y coordinates can be converted into 2 vec structures and the remaining fixed grids can be combined into a smaller number of grid layouts. Since the grid offsets don't change during a window resize relative to the pivot coordinates, only the coordinates of the pivot points referred to by the vecs would need to change.
It's kind of like vertex shaders in a graphics layout for polygons. The list of vertexes are figured out in a batch but the 3d model only keeps track of which vertexes are used rather than their positions. In 2d we could go a step farther by figuring the x pivots and y pivots separately so that the rows and columns of the grid could be stored as a 2d array of enumerated values for determining which grid cell contains which gadget.
Since https://github.com/redox-os/orbtk/issues/394 was prematurely closed and https://github.com/redox-os/orbtk/issues/392 is not exactly the right topic I decided to open a new issue and consolidate all information about this issue here.
Describe the bug Showcase application runs very slowly, consuming 100% CPU (both debug and release). Input lag on debug version is >1min. On release version input lag is somewhat between 5-30 seconds. Not only mouse clicks and key presses in application window are processed slowly, but CTL-C from the terminal that ran the app is also processed with the same lag.
It also looks like input lag and CPU load depends on amount of widgets (even those not visible) in the app. If I remove almost all widgets from the showcase application it lags less severely.
To Reproduce
Desktops:
Hardware Intel CPU with integrated graphics (i915)
Screenshot
This screenshot demonstrates the lag and CPU load. The highlighted "up" arrow was pressed a few minutes ago and only now we see the animation.
(EDITED)
release build with debug symbols built
perf report recorded
perf.data.zip
Please see
perf report
with attached report. It shows that there are issues with rendering performance.