screencopy: add support for separate cursor capturing

quantum5 commented 3 years ago

This is a proposal for the version 4 of the screencopy protocol, which will support separate cursor capturing. The intended use case would be for custom composition of the cursor after capture, which is suitable for applications like wayvnc or Looking Glass, and also implementing the metadata cursor mode for xdg-desktop-portal-wlr.

Judging by existing discussions, the ideal place to add this feature would be to the screencopy protocol, which is exactly what is done here.

Issues regarding software cursors has been brought up here. I think there is no real nice solution to capture software cursors separately, nor does it make sense to try. An application could always draw what looks like a cursor on the screen, which could also be nowhere near the actual pointer location. Therefore, specifying 0 for overlay_cursor cannot guarantee no visible cursor appears in the captured image.

Instead, I think the solution is to capture only hardware cursors separately, and if that is not possible, then we simply report that the cursor is invisible and have it be composed onto the screen. This works for all custom composition use cases. This approach is also taken by other capturing APIs, such as DXGI Desktop Duplication on Windows.

Implementation-wise, this should be fairly simple. The existing capture_output and capture_output_region can remain unchanged. capture_cursor can be implemented by reading the hardware cursor, and zwlr_screencopy_pointer_reporter_v1 can be implemented by a few calls in wlr_seat_pointer.c.

/cc @emersion @any1 @Xyene

any1 commented 3 years ago

Funny. We had a discussion about this on #sway-devel a few days ago. I'll post it here for reference:

16:03:01          any1 | emersion: Did you read my question about cursor image capturing?
16:03:10      emersion | hmmmm
16:03:27      emersion | > any1 | I'm thinking about revisiting the cursor image capturing work
16:03:28      emersion | ah, sorry
16:04:13          any1 | no worries, perhaps it should have been directed at you, rather than just someone... :)
16:04:14      emersion | any1: how should that work when hardware cursors aren't available?
16:04:38      emersion | ie. we need to draw the cursors in the final FB, or else the user won't see it?
16:04:53      emersion | or maybe the user has 2 seats
16:06:06          any1 | Yeah, I'm not entirely sure about that, but assuming that we have hw cursors and a single seat, which should be true for 99% of cases, capturing
                       | the cursor make sense
16:06:34      emersion | maybe we really need the compositor to call a screencopy function some time before wlr_output_render_cursors()
16:07:15      emersion | note, headless doesn't have hardware cursors
16:07:19      emersion | but maybe it should
16:07:35      emersion | it can just do nothing with the hw cursors
16:08:02          any1 | yeah, we could turn of cursor on headless, optionally
16:08:13          any1 | s/of/off
16:08:56      emersion | columbarius looked into this as well
16:09:01          any1 | but, yeah, rendering the cursors after screencopy could be a good idea
16:09:14      emersion | https://github.com/emersion/xdg-desktop-portal-wlr/issues/133
16:09:22      emersion | ah, btw
16:09:41      emersion | i've been thinking about accessing the buffers after the wlr_output commit event
16:09:43          any1 | but then we wouldn't be able the simplify in the way that I suggested some time abo
16:09:47      emersion | to read them out
16:09:57      emersion | i'm not sure it'll work too well on nouveau
16:10:17          any1 | oh?
16:10:32      emersion | hmmm, actually maybe this is just about the cursor buffer
16:10:42      emersion | but nouveau likes to migrate buffers around in its memory
16:10:51          any1 | yuck
16:11:08      emersion | we had issues when trying to access buffers from OpenGL while they're being displayed by KMS
16:11:38      emersion | but maybe only the cursor buffer triggers this, and it'd be fine for the primary plane. need to test
16:12:08      emersion | so, re texture vs buffer:
16:12:27      emersion | we should always have a wlr_buffer around with the cursor image for hw cursors
16:13:03      emersion | it's stuffed in wlr_output
16:15:37          any1 | Mmm, so we change this https://github.com/swaywm/wlroots/blob/master/include/wlr/types/wlr_output.h#L38 to some variant of wlr_buffer?
16:16:06      emersion | any1: just use this: https://github.com/swaywm/wlroots/blob/0c19a2826632f5f0f9ac615b010d319f4cfa2c77/include/wlr/types/wlr_output.h#L180
16:16:32      emersion | the rest of the cursors will get painted by the compositor with GL
16:19:35          any1 | Ah, ok, so this is the hw cursor. That's convenient
16:19:44      emersion | yup :)
16:19:56      emersion | and it's already rotated/scaled/etc
16:20:55          any1 | But for the sake of future proofness of the protocol, perhaps the "capture_cursor" request should take the "wl_seat" as an argument
16:21:58      emersion | some compositors might have multiple cursors per seat
16:22:31      emersion | (GNOME does this)
16:23:49          any1 | then we use the "wl_pointer". :)
16:24:29          any1 | or maybe we just add a call named "capture_output_cursor" and then we might just add "capture_pointer_cursor" later.
16:25:10      emersion | should be capture be tied to an output at all?
16:25:32      emersion | should the*
16:25:40          any1 | Yeah, if your capturing the output's cursor front buffer, right?
16:26:14      emersion | the caller doesn't really care whether it's a hw cursor or not
16:27:05          any1 | So you'd rather have no argument to the call?
16:27:19      emersion | hm
16:28:38      emersion | i think we ought to handle properly the case without hw cursors, at least at the protocol level
16:29:07      emersion | implementations can be easily fixed, but protocols are written in stone
16:29:55      emersion | from a protocol pov, i guess none of this really matters
16:30:23      emersion | the caller just wants a big image without a cursor, and a separate small image with just the cursor
16:31:16      emersion | we just need to be careful to not send a big image with a cursor *and* a small image with the cursor
16:32:02      emersion | so yeah, you're right, it should be tied to the output, and ideally tied to the output capture itself
16:32:30          any1 | yeah, it would be good to make it sort of atomic
16:33:18      emersion | so the caller could request a capture of an output with 3 modes: no cursor, cursor painted on big buffer, cursor not painted on buffer but sent as
                       | a separate small buffer
16:33:45          any1 | yep, sounds good
16:34:32      emersion | not sure how that all fits in the screencopy protocol
16:34:51          any1 | maybe we should create a new request called something like "capture_with_options" and have option flags for this, so that we don't end up with
                       | monstrosities like "capture_with_damage_and_cursor_on_the_side" :p
16:34:56      emersion | screencopy has become a bit messy with all of the additions :S
16:35:24      emersion | ahah
16:35:26      emersion | yea
16:35:46      emersion | we should've used flags instead of just a bool for cursor
16:36:10         Lynne | does this apply for both screencopy or screencopy-dmabuf (it doesn't to export-dmabuf, because that has a working flag to enable/disable mouse
                       | cursor capture)
16:36:37      emersion | applies to both i'd assume
16:36:52      emersion | export-dmabuf is a bit complicated yeah
16:37:09      emersion | there are some cases where the user will request no cursor, but we'll still be sending one
16:37:11          any1 | yeah, except, I don't think that it really makes sense to capture the cursor to a dmabuf..
16:38:40          any1 | which brings us to another question, should the user supply a separate buffer for the cursor or just a single buffer that's large enough for both?
16:38:50      emersion | capturing cursor to a dmabuf might be handy
16:39:05         Lynne | that sounds incredibly niche and useless
16:39:22      emersion | doesn't va-api allow some blending operations?
16:40:03      emersion | i'm not very familiar with va-api post proc api
16:40:05         Lynne | no
16:40:12      emersion | i know it allows some stuff
16:40:44      emersion | i'd rather not use a single buffer, any1
16:40:47         Lynne | if it did, intel would've sent a patch to implement it in ffmpeg
16:43:15          any1 | right, I believe that I have what I need. I'll make some changes and send a PR.
16:44:00          any1 | Thanks!
16:45:50      emersion | it can also do rotations and mirroring
16:46:00      emersion | and cropping
16:46:03      emersion | and scaling
16:48:06      emersion | any1: motivation for two buffers: get primary buffer as dmabuf and cursor as shm
16:49:07      emersion | callers can easily create a buffer which refers to part of the primary buffer by messing with offsets and width/height, and keep the stride
16:49:30      emersion | hm, i guess wl_shm is missing an offset. there's a patch to add it
17:29:21         Lynne | emersion: this really makes the screencopy-protocol not suitable for anything but VNC
17:29:45      emersion | Lynne: what do you mean?
17:30:03         Lynne | well, you need to overlay the cursor yourself, don't you?
17:30:21      emersion | if you request to have the cursor as a separate buffer, yes
17:31:19         Lynne | oh, and if you don't, the compositor will do it for you, right?
17:31:25      emersion | yup
17:31:34         Lynne | good
17:31:36      emersion | or you can also request to have no cursor at all

Your proposal is different to what we discussed, but I don't see anything wrong with this approach in a broad sense. Although, maybe cursor position seems a little bit out of place in a screen capturing protocol?

quantum5 commented 3 years ago

Interesting, I didn't know about this discussion.

Regarding the cursor position, it's perfectly within scope. How else will you know where to put the cursor once you receive it? It can't be tied to zwlr_screencopy_frame_v1 because we don't want to keep capturing the cursor when it hasn't changed.

any1 commented 3 years ago

Regarding the cursor position, it's perfectly within scope. How else will you know where to put the cursor once you receive it? It can't be tied to zwlr_screencopy_frame_v1 because we don't want to keep capturing the cursor when it hasn't changed.

If you have a virtual pointer, you can know its cursor position. However, I'm not sure it's safe to assume that we're always using this with a virtual pointer. I guess that you could argue that the position of the cursor is required to reconstruct the "whole picture" and is therefore fully in scope for this.

quantum5 commented 3 years ago

Requiring a virtual pointer would basically limit this protocol to remote desktop clients. One of the use cases I have in mind is supporting metadata cursor mode in xdg-desktop-portal-wlr for screencasting, and the portal will not be able to use a virtual pointer. As such, I see reporting the cursor position as integral.

emersion commented 3 years ago

I agree sending the cursor position is desirable. The cursor coordinates should be output-buffer-local so that the client can just blit the cursor buffer onto the primary buffer without having to perform any scale/transform/etc guessing work.

I'm not sure how to best translate this into protocol design, but I'd prefer to have the output capture and the cursor information to be atomically requested by the client. With this patch the client requests first the cursor info, then captures the output and cursor. All of this relies on global state -- but I don't want to end up in a situation where the client requests cursor data and the compositor has already painted the cursor on the output buffer.

quantum5 commented 3 years ago

Regarding using output-buffer-local coordinates for the mouse, I agree it would make the composing the cursor much easier. However, it might interact poorly with capture_output_region, which requests the region in output logical coordinates.

I agree the potential race condition is difficult to resolve given the existing protocol design. However, I think the situation you described is only possible when switching between hardware and software cursors. In that case, both the cursor and output will be updated, and the situation should last for no more than one frame, which might be acceptable.

One of the ways I thought of is by replacing the position event in zwlr_screencopy_pointer_reporter_v1 with an update event. This will report the cursor position and two serial numbers, one for the cursor, one for the output. zwlr_screencopy_frame_v1 will then report a serial number. This should enable the client to synchronize all the updates.

any1 commented 3 years ago

Maybe we should consider what a "perfect" protocol for capturing cursors would look like and work from there?

emersion commented 3 years ago

Yes, that sounds like a good idea. I wouldn't be against a screencopy-v2 protocol. With DMA-BUFs and damage tracking v1 has become quite weird.

any1 commented 3 years ago

But ideally, screencopy-v2 should also include support for capturing toplevels, right?

quantum5 commented 3 years ago

I wrote a draft for what a v2 screencopy might look like. The most notable change is the separation of screenshot and screencast into separate objects, so that one object doesn't have to handle both, avoiding in the current "weirdness" with damage tracking and DMA-BUFs in v1. The screenshot part is effectively the the same as version 1 of the v1 protocol, providing screenshots with wl_shm-backed buffers. I don't think a screenshot program would benefit from DMA-BUFs.

The screencast portion is designed as a continually updated stream once the client provides a suitable buffer, and it'll continue to send updates until told otherwise. Race conditions between updates are avoided as updates are implemented in the protocol as atomic units between update and update_done events. The stream is paused until the client is ready for a new update, preventing the buffer from being written to while the client is reading from it.

I am not sure how to implement capturing toplevels (with xdg-foreign? wlr foreign toplevel management?) or if it is desirable, but the protocol should be easily extendable to support that use case.

nowrep commented 3 years ago

It would be nice to be able to capture only the cursor without capturing screen. Should be possible with the protocol in this PR, but not with the v2 draft, if I understand correctly.

My use case - OBS game capture, image is already captured from the game process itself.

quantum5 commented 3 years ago

The v2 draft can be extended to support your use case by adding only_cursor to the capture_mode enumeration, and then changing the wording of the zwlr_screencast_v1 events and requests slightly. It does make the zwlr_screenshot_v1 slightly awkward though, since it currently assumes a buffer exists. It might have to send zero buffer events and allow NULL to be passed to begin.

any1 commented 3 years ago

What's the point of the "continue" request? You need to supply a new buffer to be copied into, so the copy/begin request should suffice.

It seems that you can only supply one buffer to each screencast begin request. I take it that the user should use this interface to capture either the output buffer or the cursor buffer? If that is to be the way things are done, then perhaps it's better to have separate interfaces for output buffer capturing and cursor capturing. Another way of doing this would be to have the begin request take two buffer arguments: one for the output and the other for the cursor; both nullable, of course.

It's probably better to rename the protocol to zext_screencopy_v1 and move the discussion to wayland-protocols on freedesktop.org.

quantum5 commented 3 years ago

The current v2 draft lets you pass one buffer for the frame that would be reused for every frame. You cause the next frame to be captured by calling continue. This is done so that the compositor could copy only the damaged sections into the buffer instead of copying the full frame all the time.

The cursor is passed to the client as a wl_buffer owned by the compositor. I am not sure if that works in practice, but I did it this way because the cursor size changes frequently and we don't want to interrupt the sceencast when it does.

I'll think a bit more about this before trying to upstream.

any1 commented 3 years ago

The current v2 draft lets you pass one buffer for the frame that would be reused for every frame. You cause the next frame to be captured by calling continue. This is done so that the compositor could copy only the damaged sections into the buffer instead of copying the full frame all the time.

It's good to be able to use double or triple buffering so that you can wait for a new buffer while you're reading from the last one. If we want the compositor to only copy the damaged regions, we could supply something like "buffer age" with each buffer. See https://emersion.fr/blog/2019/intro-to-damage-tracking/

The cursor is passed to the client as a wl_buffer owned by the compositor. I am not sure if that works in practice, but I did it this way because the cursor size changes frequently and we don't want to interrupt the sceencast when it does.

It does not work in practice.

columbarius commented 3 years ago

Sry late to the party, but what about changing the design to using a capability/request bitmap instead of a cursor mode? The bitmap would encode what information is requested like:

Screencontent (current shm/dmabuf screencopy, or seperated into two flags)
Cursorcontent
Cursorposition
...

And then have a two stage process: (eg. on the client side)

Setup screencast target (current capture)
Query which capabilities are supported
Create screencast_frame with desired options
Receive information callbacks like buffer sizes, ...
Receive info_done event: Process infos and prepare followup requests (copying buffers, cursor spy, etc...) Send prepared requests
Receive callbacks (copy_success/failure, etc.)
Receive frame_done event Destroy frame, ...
Goto: 3.

This way we could extend this protocol very easily with additional modes and assure that all extracted informations are from the same point of time.

emersion commented 3 years ago

wlr-protocols has migrated to gitlab.freedesktop.org. This pull request has been moved to:

https://gitlab.freedesktop.org/wlroots/wlr-protocols/-/merge_requests/105

swaywm / wlr-protocols

screencopy: add support for separate cursor capturing #105