labwc / labwc

A Wayland window-stacking compositor
https://labwc.github.io
GNU General Public License v2.0
1.73k stars 158 forks source link

Rarely, title-bar may not be resized on snapping to bottom/top by drag-and-drop #1194

Open tsujan opened 1 year ago

tsujan commented 1 year ago

It happened here very randomly and rarely when I snapped a window to the bottom of the screen by dragging its title-bar and dropping it to the bottom edge. Fortunately, I took a screenshot:

bottom

I encountered it only once with the top edge and haven't seen it without drag-and-drop. However, I can't be sure that it's really related to drag-and-drop.

Sorry if it's already known or fixed in git; I use 0.6.5 and couldn't find it after a simple search.

johanmalm commented 1 year ago

Thanks for reporting. Have not seen that before!! What's the stuff in the top left corner? Could you send config for that - to help me reproduce (in case it's related)

dragging its title-bar and dropping it to the bottom edge

Are you sure? IIRC we only support dropping at top (for Maximize) and sides (for SnapToEdge)

stefonarch commented 1 year ago

Are you sure? IIRC we only support dropping at top (for Maximize) and sides (for SnapToEdge)

DND to bottom is similar to sides, half screen here.

johanmalm commented 1 year ago

Are you sure? IIRC we only support dropping at top (for Maximize) and sides (for SnapToEdge)

DND to bottom is similar to sides, half screen here.

Yes, you're right. I've just read the manual to remind myself. Ignore my comment.

tsujan commented 1 year ago

Attached my config folder:

labwc.zip

tsujan commented 1 year ago

Have not seen that before

Because it happens very randomly and rarely. However, I might be able to make it happen for the bottom edge if I try hard. So, if you make a fixing patch that can be applied or adapted to 0.6.5, I could test it.

What's the stuff in the top left corner?

Nothing. I sent the bottom part of the screenshot. Waybar is at the top edge of the screen, without any window rule. And PCManFM-Qt draws the desktop.

It seems that the non-resized title-bar always fills the left side (even when the DND is made slightly to the the right of the bottom edge), and that its size is the width of the dragged window.

EDIT: The workspace has no role in it.

EDIT1: I encountered it with two apps with very different codes, one in Qt6 and the other in Qt5, namely, FeatherPad and QTerminal. The reason may be that I only tile the windows of those apps.

Consolatis commented 1 year ago

This may happen when the client doesn't respond fast enough, do you have anything like client (some_client_app_id) did not respond to configure request in 100 ms in your logs?

Edit: You need to start labwc with the -V (or -d) flag for the message to show up.

Edit 2: If you feel like testing you could also manually increase the timeout in src/xdg.c (on top, CONFIGURE_TIMEOUT_MS).

Edit 3: Cross posted from #1195

The system log is clean. ~/.local/share/sddm/wayland-session.log only shows lines like this about LabWC:

00:25:27.484 [ERROR] [../labwc-0.6.5/src/desktop.c:422] Unknown node detected

See above, for that message to appear labwc need to at least run with --verbose (or --debug for even more log spam).

tsujan commented 1 year ago

Yes, this appeared after I added -V and reproduced the issue with QTerminal:

00:01:59.474 [INFO] [../labwc-0.6.5/src/xdg.c:117] client (qterminal) did not respond to configure request in 100 ms

EDIT: I'm going to test with #define CONFIGURE_TIMEOUT_MS 200; will report back.

tsujan commented 1 year ago

I tested with 200 and 250 for CONFIGURE_TIMEOUT_MS, but the issue happened as before.

Also, I found out that it isn't related to how fast I snap by DND; a very slow DND can have the same effect.

The irony is that, as the reporter, I don't tile any window to the bottom region when actually working — only the bottom right or left.

Consolatis commented 1 year ago

Hm.. the existence of that entry in the log itself shouldn't really happen. To figure out if that is just a coincidence or the actual cause of the issue, could you try with CONFIGURE_TIMEOUT_MS 10000 (e.g. 10 seconds) or something ridiculous like that?

Also, I found out that it isn't related to how fast I snap by DND; a very slow DND can have the same effect.

The speed at which the DND happens by the user shouldn't really matter, the timeout refers to the time an application needs to respond to a configure request from labwc.

tsujan commented 1 year ago

As I'm getting "skillful" in reproducing it, it was easy to make it happen after setting CONFIGURE_TIMEOUT_MS to 10000 — I just waited 10 seconds after seeing that nothing happened immediately after DND to the bottom.

johanmalm commented 1 year ago

@tsujan Which application(s) does it happen with? All Qt5/Qt6, featherpad/qterminal only? Does it happen if you start labwc without config (labwc -C /dev/null)

johanmalm commented 1 year ago

I've been trying with featherpad 1.4.1 but still can't reproduce

tsujan commented 1 year ago

Which application(s) does it happen with?

Any Qt5 or Qt6 app I tried: FeatherPad (Qt6), QTerminal (Qt5), PCManFM-Qt (Qt6), SpeedCrunch (Qt5)...

I've been trying with featherpad 1.4.1 but still can't reproduce

You don't have my "skill" :P

tsujan commented 1 year ago

Since you're both so responsive and I don't want to waste your time, I just wanted to say that I'll be away from the computer for a while. Will come back later....

stefonarch commented 1 year ago

You don't have my "skill" :P

After trying maybe 30 times I'm up to your skills. Dragging to the bottom left and re-dragging to center screen again several times triggers it here too. It's hard to trigger. It wasn' easy to make a screenshot using the shortcuts as it turned immediately well when pressing them, had to switch desktop and use sleep.

screen_area_sab_19:37:14_

tsujan commented 1 year ago

After trying maybe 30 times I'm up to your skills.

Good :)

Coming back with four more "discoveries" (the first two may be useless):

@johanmalm, @Consolatis, unless you know that QtWayland can't have a role in this, it can be closed as far as I'm concerned.

stefonarch commented 1 year ago

I tried with thunderbird too, no success.

Flrian commented 1 year ago

Some observations from trying it with calibre on arch:

For me it only happened the first time I dragged it against the bottom edge of the screen. After closing and opening calibre again the bug always happened if it was the first thing I tried. It only happened if I didn't snap it somewhere else or resized it first. When starting calibre with "QT_QPA_PLATFORM=xcb" (xwayland) dragging it against the bottom edge wasn't working at all, against the sides did though. Moving the calibre window so part of it is off screen, moving it back and then dragging it against the screen edge prevented the bug from occuring. I didn't release it after I started moving it.

I wasn't able to reproduce it on gentoo.

tsujan commented 1 year ago

Some observations from trying it with calibre on arch

A Qt app, again.

Consolatis commented 1 year ago

It would be interesting to know if each time this issue occurs there is also the did not respond to configure request log entry. And similarly, if it doesn't occur if there is no log entry. I am still not sure if that is actually related or just a coincidence and we should look elsewhere for a cause.

Another thing that could help to debug this is a WAYLAND_DEBUG=1 log from the application at the point when it happens.

That it only seems to happen with Qt applications doesn't really matter that much for assigning a priority to the issue. My usual stance is "we don't like working around client bugs" (and it isn't even verified that its a client bug at all) but for whole GUI frameworks that a lot of applications depend on we should IMHO try to make it still work. As that is what a user sees when using labwc.

tsujan commented 1 year ago

It would be interesting to know if each time this issue occurs there is also the did not respond to configure request log entry. And similarly, if it doesn't occur if there is no log entry.

Both confirmed with my tests.

Consolatis commented 1 year ago

Hm.. I have a slight feeling that us resetting the pending serial to 0 in handle_configure_timeout() sometimes prevents handle_commit() to set update_required = true as the pending serial doesn't match the received one. But I didn't really think this through. Thoughts @jlindgren90 ?

jlindgren90 commented 1 year ago

I think that the fact that the timeout occurs most likely indicates a client (probably Qt) issue. The mismatch between title bar and client surface is an (expected) side effect of that. Normally we try to resize the title bar on the same frame as the client surface so it looks "smooth" but if the client is lagging, we will go ahead and resize the title bar anyway (after the timeout).

I don't think resetting the serial in the timeout handler makes a difference - the issue is caused by the client surface not resizing and has already occurred at that point.

jlindgren90 commented 1 year ago

Probably it would be necessary to sniff the Wayland connection to determine for sure where the issue lies, i.e. why the resize is not happening in a timely fashion.

jlindgren90 commented 1 year ago

P.S. the issue title says "title-bar may not be resized" but it actually sounds (from later descriptions) like it is the opposite - the title bar does resize but the window contents do not.

johanmalm commented 1 year ago

Although from the scrots it looks like with window has resized (snapped to bottom) but the titlebar has not resized (or have I misunderstood the bug)

tsujan commented 1 year ago

but it actually sounds (from later descriptions) like it is the opposite

In all my tests that resulted in the issue, the Qt windows were resized as expected, while their title-bars didn't. Also, see the screenshot(s).

johanmalm commented 1 year ago

but it actually sounds (from later descriptions) like it is the opposite

In all my tests that resulted in the issue, the Qt windows were resized as expected, while their title-bars didn't. Also, see the screenshot(s).

You’ve just answered my question :)

Consolatis commented 1 year ago

I really don't understand how this can happen, client bug or not. The client surface is resized (stretches the bottom screen) but the titlebar is not.

Even if we run into the timeout handler, a configure ACK that comes later and acknowledges a modified size (and ends up in handle_commit()) gets checked if the newly commited surface size differs from the current one. If yes we set update_required to true. That in turn calls view_impl_apply_geometry() which in turn sets view->current.width to the new surface size and then calls view_moved(). That one then unconditionally calls ssd_update_geometry() which in turn uses view->current.width to update everything.

So even if we overwrite the pending serial in the timeout handler, we should still have a synced titlebar width and client surface width. I must be missing something somewhere.

jlindgren90 commented 1 year ago

I think we need some more debugging output to get a complete picture of what is going on. Working on that, will post a diff shortly.

jlindgren90 commented 1 year ago

@tsujan @stefonarch Can you kindly apply this logging patch and capture new logs when the issue occurs? https://gist.github.com/jlindgren90/dc6c37bd7c05635092e1d8f93744713b

johanmalm commented 1 year ago

What happens after the frame captured in the screenshots? Does the titlebar take on the right size in the next frame? Does the titlebar get stuck on the wrong size? If not, could you describe what happens.

tsujan commented 1 year ago

Does the titlebar get stuck on the wrong size?

The title-bars didn't change at all, until I dragged them and un-tiled the windows. Imagine them as stable title-bars with wrong sizes, as long as their corresponding windows remained tiled (correctly).

Also, the bottom edge may have a role in this. Only once I saw this with the top edge, and I'm not sure whether I dragged the window or not. I can't reproduce it with the top edge.

Sorry, I can't do more tests for now, but I could answer questions about my previous tests.

johanmalm commented 1 year ago

What happens with the border that goes around the window? Does that have the same width as the titlebar or client?

tsujan commented 1 year ago

Oh, this should be interesting to you -- just now, I realized something, to which I hadn't paid attention before:

As soon as the window becomes inactive, the title-bar is corrected.

The reason that I hadn't seen it before was that, in my previous tests, the title-bar didn't change when I changed the workspace. However, there was no other window on the other workspace to make the first window become inactive.

tsujan commented 1 year ago

What happens with the border that goes around the window?

The borders of wrong sized title-bars fit them, as in the case of correct title-bars.

jlindgren90 commented 1 year ago

It almost sounds like the commit event gets stuck in a buffer somewhere and comes through later. What I don't understand is how the surface is visually the right size if the commit didn't come through.

stefonarch commented 1 year ago

As soon as the window becomes inactive, the title-bar is corrected.

That is exactly why I couldn't take a screenshot in a simple way.

tsujan commented 1 year ago

To be as clear as possible, I complete my above sentence:

As soon as the window becomes inactive, the title-bar is corrected and remains corrected afterwards.

tsujan commented 1 year ago

That is exactly why I couldn't take a screenshot in a simple way.

My method of taking simple screenshots doesn't deactivate windows. It's just a shortcut that creates the screenshots inside a folder.

Flrian commented 1 year ago

With the logging patch from @jlindgren90 opened calibre from terminal, dragged it to the bottom edge, released, bug happened dragged it back to middle of screen, released, dragged it to the bottom edge, released, bug didn't happen labwc_bug.log

WAYLAND_DEBUG=1 from calibre dragged calibre to bottom of screen, released, bug happened, quit labwc from root menu calibre.log

Not sure if any of that is helpful.

Consolatis commented 1 year ago

It almost sounds like the commit event gets stuck in a buffer somewhere and comes through later. What I don't understand is how the surface is visually the right size if the commit didn't come through.

Yeah, basically this. And I asked myself the same.

As soon as the window becomes inactive, the title-bar is corrected and remains corrected afterwards.

Hm.. interesting. I thought that would just hit another code path that calls ssd_update_geometry() but it actually doesn't seem to. Its just calling view_set_activated() which in turn calls xdg_toplevel_view_set_activated() (so it sends something to the client). I wonder if that somehow makes the commit event "unstuck".

Edit:

Relevant part of the `calibre.log` protocol log (line numbers in front): ``` [..] 210 [1096366.184] -> wl_compositor@6.create_surface(new id wl_surface@36) 211 [1096370.333] -> xdg_wm_base@3.get_xdg_surface(new id xdg_surface@33, wl_surface@36) 212 [1096370.349] -> xdg_surface@33.get_toplevel(new id xdg_toplevel@37) [..] 938 [1100013.074] xdg_toplevel@37.configure(1898, 498, array[4]) 939 [1100013.098] xdg_surface@33.configure(129) 940 [1100013.126] -> wp_viewport@39.set_destination(1898, 498) 941 [1100013.136] -> wl_compositor@6.create_region(new id wl_region@34) 942 [1100013.144] -> wl_region@34.add(0, 0, 1898, 498) 943 [1100013.152] -> wl_surface@36.set_opaque_region(wl_region@34) 944 [1100013.158] -> wl_region@34.destroy() 945 [1100013.165] -> xdg_surface@33.ack_configure(129) 946 [1100113.339] wl_display@1.delete_id(34) 947 [1100113.354] wl_display@1.delete_id(52) 948 [1100113.359] wl_callback@52.done(182525) 949 [1100113.373] wl_surface@36.enter(wl_output@15) 950 [1100113.461] -> wl_shm_pool@41.destroy() 951 [1100113.473] -> wl_buffer@40.destroy() 952 [1100113.518] -> wl_shm_pool@42.destroy() 953 [1100113.525] -> wl_buffer@51.destroy() 954 [1100113.568] -> wl_shm_pool@43.destroy() 955 [1100113.578] -> wl_buffer@44.destroy() 956 [1100113.616] -> wl_shm@4.create_pool(new id wl_shm_pool@52, fd 42, 3780816) 957 [1100113.623] -> wl_shm_pool@52.create_buffer(new id wl_buffer@34, 0, 1898, 498, 7592, 0) 958 [1100121.561] -> wl_surface@36.frame(new id wl_callback@32) 959 [1100121.572] -> wl_surface@36.attach(wl_buffer@34, 0, 0) 960 [1100121.577] -> wl_surface@36.damage_buffer(0, 0, 1898, 498) 961 [1100121.581] -> wl_surface@36.commit() [..] ``` In general it looks fine to me. Although there is a decent time passed between us sending the configure request (line 938) and the wl_surface commit with the new buffer (line 959 or 961). I am not sure right now what unit the timestamp in front is, hard to imagine that its about 8 seconds between these events.
jlindgren90 commented 1 year ago

I think I understand what's going on here (or at least have a plausible theory). It looks like a Qt bug. And thanks @Flrian, the logs are extremely helpful (especially having both).

xdg-shell has a set_window_geometry call that clients may use to set their logical "window geometry" to a sub-region of their xdg_surface. This is (I believe) intended for CSD windows to let the compositor know what portion of the surface is the "window" and what portion is decorations, drop shadows, etc.

It appears (from the calibre log) that Qt is sending a set_window_geometry with its initial configure response, and at that time its requested window geometry matches the size of the surface (680x498).

[1096428.066] xdg_toplevel@37.configure(0, 0, array[4])
[1096428.070] xdg_surface@33.configure(125)
...
[1096428.273]  -> wp_viewport@39.set_destination(680, 498)
[1096428.277]  -> xdg_surface@33.set_window_geometry(0, 0, 680, 498)
[1096428.282]  -> xdg_surface@33.ack_configure(125)
[1096428.647]  -> wl_surface@36.attach(wl_buffer@40, 0, 0)
[1096428.656]  -> wl_surface@36.damage_buffer(0, 0, 680, 498)
[1096428.661]  -> wl_surface@36.commit()

When labwc later send a configure with a larger size (the tiled size, 1898x498) Qt resizes the actual surface but doesn't send a new set_window_geometry call. So wlroots continues to use the old geometry (680x498) and that's what wlr_xdg_surface_get_geometry() returns (this can be confirmed from the labwc log).

The actual surface is now larger (which you can see visually) but labwc continues to use the "window geometry" (which is now incorrect) to size the title bar and window borders.

[1100013.074] xdg_toplevel@37.configure(1898, 498, array[4])
[1100013.098] xdg_surface@33.configure(129)
[1100013.126]  -> wp_viewport@39.set_destination(1898, 498)
[1100013.136]  -> wl_compositor@6.create_region(new id wl_region@34)
[1100013.144]  -> wl_region@34.add(0, 0, 1898, 498)
[1100013.152]  -> wl_surface@36.set_opaque_region(wl_region@34)
[1100013.158]  -> wl_region@34.destroy()
[1100013.165]  -> xdg_surface@33.ack_configure(129)
...
[1100121.572]  -> wl_surface@36.attach(wl_buffer@34, 0, 0)
[1100121.577]  -> wl_surface@36.damage_buffer(0, 0, 1898, 498)
[1100121.581]  -> wl_surface@36.commit()

I couldn't find any existing Qt bug report for this. If anyone has the time/motivation to create a simple Qt test case that can reproduce the bug, then capture logs and open a Qt bug report, that's probably the best way to get this really solved.

There are workarounds we could look into on the labwc side (e.g. detect when the surface size matches the configure request but the window geometry doesn't), not sure if we want to go that route or not.

Consolatis commented 1 year ago

Nice analysis!

When labwc later send a configure with a larger size (the tiled size, 1898x498) Qt resizes the actual surface but doesn't send a new set_window_geometry call. So wlroots continues to use the old geometry (680x498) and that's what wlr_xdg_surface_get_geometry() returns (this can be confirmed from the labwc log).

Hm. So either

The scene graph implementation for xdg windows seems to handle this case, maybe we could look into that to get some inspiration.

Edit: There is also the question of why we run into the timeout. I am not sure if xdg_surface@xx.ack_configure() is causing a handle_commit() call or if we have to wait for the wl_surface@xx.commit() one instead.

Edit 2: From the xdg-shell protocol:

Once the window geometry of the surface is set, it is not possible to unset it, and it will remain the same until set_window_geometry is called again, even if a new subsurface or buffer is attached.

I am not sure if we are using the right way to get the current (xdg / wl _)surface size (although it seems like the only way when we also want to ignore CSD drop shadows and other CSD margins around the window).

And there is also the question of why sometimes it works correctly and sometimes it doesn't. A WAYLAND_DEBUG=1 log of a correctly sized titlebar after snapping might help here. And a second one for switching the focus somewhere else fixing it.

jlindgren90 commented 1 year ago

I think our code is correct (using the window geometry is the right thing to do, in principle). I had something like this in mind for a workaround: https://github.com/jlindgren90/labwc/commit/7a69cd2ed8a7a5b5787c7061193672e5658d1a47

In @Flrian's log, the commit did happen but was just a hair too late (116ms):

00:00:05.482 [ERROR] [../src/xdg.c:301] view 0x561ca4b66730 (calibre — || Calibre Library ||) current=680x498 pending=680x498 configure=1898x498
00:00:05.482 [ERROR] [../src/xdg.c:308] configure serial=26
00:00:05.482 [ERROR] [../src/xdg.c:149] set pending serial=26
00:00:05.482 [DEBUG] [types/output/cursor.c:432] Falling back to software cursor on output 'HDMI-A-1'
00:00:05.582 [INFO] [../src/xdg.c:127] client (calibre-gui) did not respond to configure request in 100 ms
00:00:05.582 [ERROR] [../src/xdg.c:129] view 0x561ca4b66730 (calibre — || Calibre Library ||) current=680x498 pending=1898x498 at timeout
00:00:05.582 [ERROR] [../src/xdg.c:133] pending serial=26
00:00:05.598 [ERROR] [../src/xdg.c:92] view 0x561ca4b66730 (calibre — || Calibre Library ||) current=680x498 pending=1898x498 commit=680x498
00:00:05.598 [ERROR] [../src/xdg.c:97] pending serial=0 commit=26
Consolatis commented 1 year ago

Your commit could be a solution for now, it looks sensible.

The remaining questions for me are:

jlindgren90 commented 1 year ago

I would love answers to those questions but suspect it would require some deep-dives into Qt source code. :(

Consolatis commented 1 year ago

I think we may get away with just WAYLAND_DEBUG=1 logs of

If we see that Qt then sends a new (correct) set_window_geometry() then it is for sure a Qt bug and the reason why the titlebar is not resized is clear to me. If the logs do not show that, there is still something else going on that can't be explained by the stale window geometry alone.

tsujan commented 1 year ago

I don't want to complicate your analysis by saying this (before I go to sleep), but I can't reproduce the bug when I use Kvantum's window dragging, instead of dragging the window from its title-bar. Kvantum calls QWindow::startSystemMove() by using an event filter installed on QWindow, inside mouseMoveEvent and after appropriate conditions are satisfied in mousePressEvent.

I think it has something to do with the vertical position of the title-bar when the drop happens.

With Kvantum's window dragging, the cursor position is inside the window, and so, below the title-bar. Therefore, the title-bar is fully above the bottom edge of the screen and can even have a considerable distance from it on dropping.

With dragging from title-bar, the title-bar is partly below the bottom edge of the screen on dropping.

jlindgren90 commented 1 year ago

@tsujan @stefonarch @Flrian does the workaround help?