pop-os / cosmic-comp

Compositor for the COSMIC desktop environment
GNU General Public License v3.0
466 stars 83 forks source link

calloop event loop question #324

Open LunNova opened 6 months ago

LunNova commented 6 months ago

Not completely confident I've understood this correctly but:

It looks like cosmic-comp creates a single calloop EventLoop and handles all rendering within this. calloop isn't threaded so there's a single OS thread handling all events that's responsible for rendering even when multiple output devices are used.

APIs in use within cosmic-comp mostly shouldn't block for very long but they will still take some time to run, so something slow happening while compositing for one output can delay rendering for an unrelated output. and if I understand correctly some copies that happen inside the blit_frame_result call in kms/mod.rs implicitly sync waiting for the source app to be ready so if an app does something slow it could hang that thread

Is that correct?

Drakulix commented 6 months ago

APIs in use within cosmic-comp mostly shouldn't block for very long but they will still take some time to run, so something slow happening while compositing for one output can delay rendering for an unrelated output.

Yes this is correct. Long term rendering needs to be split into it's own threads (per-output) to not be bottlenecked.

and if I understand correctly some copies that happen inside the blit_frame_result call in kms/mod.rs implicitly sync waiting for the source app to be ready so if an app does something slow it could hang that thread

Not necessarily.

Firstly we do no use client buffers, that are not ready. We do poll the dmabufs and delay commits for window contents still being rendered (if the driver supports it, afaik the polling dmabufs of the nvidia driver will always return ready, nothing we can do about that). So we should never block on unfinished client work.

Secondly the wait necessary to perform that copy can be done in the GPU context, if the api supports it. A cpu-bound wait is only introduced, if the necessary EGL extensions aren't supported (nvidia...). Additionally the buffer is submitted to KMS before copying for screen-capture is happening, so it should never block scan-out.

(It's a young compositor, so it has it's issues - see threading - but it's not a naive one and lots of optimization work has already been done.)

LunNova commented 6 months ago

Glad to hear that improving it is planned already and that it mostly shouldn't cause problems as things are non-blocking where possible.

Specifically for that nvidia case I believe there's a fix involving EGL_ANDROID_native_fence_sync which kwin implements that could help. This was added in nvidia 545.

It's really exciting that this compositor is in rust. This codebase feels a lot more approachable than kwin's does and I'm tempted to start hacking on it and see if I can get it to feel as responsive as compositorless X does.

Drakulix commented 6 months ago

Specifically for that nvidia case I believe there's a fix involving EGL_ANDROID_native_fence_sync which kwin implements that could help. This was added in nvidia 545.

This is the extension we are querying. Unfortunately that only works for cases, where the fence is local to the nvidia gpu (yet). So synchronizations across GPUs involving the nvidia driver isn't working (the sync files fail to import) and their KMS api also doesn't support fences yet, so that we can't rely on this for scan-out. But all the necessary support code is there, so once the driver advertises the necessary capabilities (or the imports succeed), that should just work, like it does for mesa drivers today.

It's really exciting that this compositor is in rust. This codebase feels a lot more approachable than kwin's does and I'm tempted to start hacking on it and see if I can get it to feel as responsive as compositorless X does.

Glad to hear that! With all the feature development still going on, optimizations are of course still happening, but more with a focus on good performance and low-hanging fruits as opposed to perfect performance. Any contributions on that subject are very welcome, also for the underlying framework smithay, where for example work to reduce allocations for the main rendering path is happening right now: https://github.com/Smithay/smithay/pull/1346 .

ryanabx commented 6 months ago

Jumping into this discussion to say: join the smithay matrix server! https://matrix.to/#/#smithay:matrix.org