neutrinolabs / xorgxrdp

Xorg drivers for xrdp
Other
432 stars 109 forks source link

Complex dirty region causes overflow of xrdp comms buffer #318

Closed metalefty closed 5 days ago

metalefty commented 2 weeks ago

Version: xrdp v0.10.0 and xorgxrdp v0.10.1

Steps to reproduce:

  1. Use xterm as WM for simplicity
  2. Connect from mstsc.exe with 4K screen (3840x2160)
  3. Run xeyes -geometry 3840x2160 (exactly the same screen size with xrandr reports)
  4. Xorg hangs up

I see the following xorgxrdp log. This hang-up doesn't happen v0.9. So it is related to recent GFX change.

I recalled #64 from the log with the following log line. It looks like a buffer overrun. I also recalled #5 and #62.

[  2341.417] rdpClientConSendMsg: overrun error len, 35029 stream size 32868, client count 2

It looks like a buffer overrun, I tried to increase buffer size to 64k. I don't see any obvious buffer overrun with 64k buffer but Xorg still hangs up.

[  2320.111] rdpClientConProcessMsgClientInput: invalidate x 0 y 0 cx 5759 cy 2163
[  2329.463] rdpRRGetInfo:
[  2329.464] rdpRRCrtcGetGamma: 0x55993aa226f0 0x55993aa3ebf0 0x55993aa3eff0 0x55993aa3edf0
[  2329.464] rdpRRCrtcGetGamma: 0x55993aa226f0 0x55993aa3ebf0 0x55993aa3eff0 0x55993aa3edf0
[  2329.465] rdpRRCrtcGetGamma: 0x55993aa3f200 0x55993aa2ec50 0x55993aa2f050 0x55993aa2ee50
[  2329.465] rdpRRCrtcGetGamma: 0x55993aa3f200 0x55993aa2ec50 0x55993aa2f050 0x55993aa2ee50
[  2341.417] rdpClientConSendMsg: overrun error len, 35029 stream size 32868, client count 2
[  2398.054] rdpClientConProcessMsgClientInput: invalidate x 0 y 0 cx 5759 cy 2163
[  2529.347] rdpClientConProcessMsgClientInput: invalidate x 0 y 0 cx 5759 cy 2163
[  2534.024] rdpClientConRecv: g_sck_recv failed(returned 0)
[  2534.024] rdpClientConRecvMsg: error
[  2534.024] rdpClientConCheck: rdpClientConGotData failed
[  2534.024] rdpClientConDisconnect:
[  2534.024] rdpRemoveClientConFromDev: removing clientCon 0x55993aa3fcf0
[  2534.024] (EE)
[  2534.024] (EE) Backtrace:
[  2534.025] (EE) 0: /usr/libexec/Xorg (xorg_backtrace+0x89) [0x559939442619]
[  2534.025] (EE) 1: /usr/libexec/Xorg (0x55993926c000+0x1df149) [0x55993944b149]
[  2534.025] (EE) 2: /lib64/libc.so.6 (0x7fac2bc00000+0x54db0) [0x7fac2bc54db0]
[  2534.025] (EE) 3: /lib64/libc.so.6 (0x7fac2bc00000+0xa154c) [0x7fac2bca154c]
[  2534.025] (EE) 4: /lib64/libc.so.6 (raise+0x16) [0x7fac2bc54d06]
[  2534.025] (EE) 5: /lib64/libc.so.6 (abort+0xd3) [0x7fac2bc287f3]
[  2534.025] (EE) 6: /lib64/libc.so.6 (0x7fac2bc00000+0x29130) [0x7fac2bc29130]
[  2534.026] (EE) 7: /lib64/libc.so.6 (0x7fac2bc00000+0xab617) [0x7fac2bcab617]
[  2534.026] (EE) 8: /lib64/libc.so.6 (0x7fac2bc00000+0xad30c) [0x7fac2bcad30c]
[  2534.026] (EE) 9: /lib64/libc.so.6 (free+0x55) [0x7fac2bcaf955]
[  2534.026] (EE) 10: /usr/lib64/xorg/modules/libxorgxrdp.so (0x7fac2b24c000+0xc12b) [0x7fac2b25812b]
[  2534.026] (EE) 11: /usr/lib64/xorg/modules/libxorgxrdp.so (rdpClientConCheck+0x13c) [0x7fac2b25a14c]
[  2534.026] (EE) 12: /usr/libexec/Xorg (WakeupHandler+0xba) [0x5599392d01ca]
[  2534.026] (EE) 13: /usr/libexec/Xorg (WaitForSomething+0x1ef) [0x55993944484f]
[  2534.026] (EE) 14: /usr/libexec/Xorg (0x55993926c000+0x4c3dd) [0x5599392b83dd]
[  2534.027] (EE) 15: /lib64/libc.so.6 (0x7fac2bc00000+0x3feb0) [0x7fac2bc3feb0]
[  2534.027] (EE) 16: /lib64/libc.so.6 (__libc_start_main+0x80) [0x7fac2bc3ff60]
[  2534.027] (EE) 17: /usr/libexec/Xorg (_start+0x25) [0x5599392b8ea5]
[  2534.027] (EE)
[  2534.027] (EE)
Fatal server error:
[  2534.027] (EE) Caught signal 6 (Aborted). Server aborting
[  2534.027] (EE)
[  2534.027] (EE)
Please consult the The X.Org Foundation support
         at http://wiki.x.org
 for help.
[  2534.027] (EE) Please also check the log file at ".xorgxrdp.10.log" for additional information.
[  2534.027] (EE)
[  2534.027] rdpmouseControl: what 4
[  2534.027] rdpkeybControl: what 4
[  2534.027] rdpLeaveVT:
metalefty commented 2 weeks ago

@matt335672 @Nexarian @jsorg71 Can you reproduce this and have a look at this?

matt335672 commented 1 week ago

I've just tried to reproduce and I've not managed it so far.

I've got a breakpoint set on rdpClientCon.c:500 where the log message is generated. It's not firing.

xrdp commit : d6fce2f17355ffb3c783ffb2a31c418b3e930b26 xorgxrdp commit : 0e13e5d84e14c2488136da2f2259b41f32a627d2 (tag: v0.10.1)

In startwm.sh I've just got exec xterm -geometry 80x40

I'm definitely negotiating GFX. Client is Windows 10.

I'm probably missing something. Can you share a suitable .ini file?

metalefty commented 1 week ago

.ini has quite simple diff.

An important point, "Use all monitors" and multimon is required. (I can reproduce with a single monitor though).

This is likely to require at least a 4K screen size.

diff --git a/sesman/sesman.ini.in b/sesman/sesman.ini.in
index 685af28b..7606f5d0 100644
--- a/sesman/sesman.ini.in
+++ b/sesman/sesman.ini.in
@@ -140,7 +140,7 @@ EnableSyslog=true
 ; CentOS 8              :  param=/usr/libexec/Xorg
 ; FreeBSD (from 2022Q4) :  param=/usr/local/libexec/Xorg
 ;
-param=Xorg
+param=/usr/libexec/Xorg
 ; Leave the rest parameters as-is unless you understand what will happen.
 param=-config
 param=xrdp/xorg.conf

Here is another Xorg log.

[185866.422] rdpClientConSendMsg: overrun error len, 36725 stream size 32868, client count 2
[185869.812] (EE)
[185869.812] (EE) Backtrace:
[185869.812] (EE) 0: /usr/libexec/Xorg (xorg_backtrace+0x7d) [0x55c58cf3de7d]
[185869.812] (EE) 1: /usr/libexec/Xorg (0x55c58cd83000+0x1bec3d) [0x55c58cf41c3d]
[185869.812] (EE) 2: /lib64/libpthread.so.0 (0x7f2cd6ba8000+0x12d10) [0x7f2cd6bbad10]
[185869.812] (EE) 3: /usr/libexec/Xorg (dixLookupProperty+0x50) [0x55c58cdfed10]
[185869.812] (EE) 4: /usr/libexec/Xorg (0x55c58cd83000+0x7c89e) [0x55c58cdff89e]
[185869.812] (EE) 5: /usr/libexec/Xorg (0x55c58cd83000+0x5c5de) [0x55c58cddf5de]
[185869.812] (EE) 6: /usr/libexec/Xorg (0x55c58cd83000+0x60686) [0x55c58cde3686]
[185869.812] (EE) 7: /lib64/libc.so.6 (__libc_start_main+0xe5) [0x7f2cd681dd85]
[185869.812] (EE) 8: /usr/libexec/Xorg (_start+0x2e) [0x55c58cdcd02e]
[185869.812] (EE)
[185869.812] (EE) Segmentation fault at address 0x0
[185869.812] (EE)
Fatal server error:
[185869.812] (EE) Caught signal 11 (Segmentation fault). Server aborting
matt335672 commented 1 week ago

Thanks. I'll have another go. I can probably set up a VM with two 4K monitors.

matt335672 commented 1 week ago

I'm still not able to reproduce this with a Windows 10 client and two 4K monitors.

In some ways I'm surprised this is happening, as with xterm as a WM and running xterm as an application, everything is in black-and-white. I'd have though that would compress really well.

Could you post your xrdp.log and full Xorg.log for a crash? I can see you're not running with two identical monitors but I can't see the exact geometry. That might have something to do with it. Also, with a full xrdp.log I can check I'm negotiating the same codec.

I've tried with and without --enable-glamor on xorgxrdp.

metalefty commented 1 week ago

@matt335672 Ah, I made a serious typo. xterm at step 3 is actually xeyes. They are completely different software and, naturally, you couldn't reproduce. Could you try again? If you are still not able to reproduce, I will collect log and post.

matt335672 commented 1 week ago

Thanks for the update!

Following your new instructions I've reproduced this on a single 4K monitor. Stack trace is as follows:-

(gdb) where
#0  rdpClientConSendMsg (dev=0x56e76232f6a0, clientCon=0x56e76264e370) at rdpClientCon.c:500
#1  0x00007117547e16e9 in rdpClientConSendPending (dev=0x56e76232f6a0, clientCon=0x56e76264e370) at rdpClientCon.c:533
#2  0x00007117547e7951 in rdpClientConSendPaintRectShmFd (dev=0x56e76232f6a0, clientCon=0x56e76264e370, id=0x7ffd77ef4780, dirtyReg=0x56e762911ee0, copyRects=0x56e762946620, 
    numCopyRects=787) at rdpClientCon.c:2735
#3  0x00007117547e7f26 in rdpCapRect (clientCon=0x56e76264e370, cap_rect=0x7ffd77ef4770, id=0x7ffd77ef4780) at rdpClientCon.c:2857
#4  0x00007117547e8230 in rdpDeferredUpdateCallback (timer=0x56e76245be10, now=2554056, arg=0x56e76264e370) at rdpClientCon.c:2953
#5  0x000056e7610c7fc0 in ?? ()
#6  0x000056e7610c8248 in WaitForSomething ()
#7  0x000056e760f583e7 in ?? ()
#8  0x000056e760f5c6b4 in ?? ()
#9  0x0000711754e29d90 in __libc_start_call_main (main=main@entry=0x56e760f455d0, argc=argc@entry=11, argv=argv@entry=0x7ffd77ef4a18) at ../sysdeps/nptl/libc_start_call_main.h:58
#10 0x0000711754e29e40 in __libc_start_main_impl (main=0x56e760f455d0, argc=11, argv=0x7ffd77ef4a18, init=<optimised out>, fini=<optimised out>, rtld_fini=<optimised out>, 
    stack_end=0x7ffd77ef4a08) at ../csu/libc-start.c:392
#11 0x000056e760f45605 in _start ()

I've worked out where the 'overrun' message is coming from. We call rdpClientConPreCheck() before sending a frame to make sure there's space in the buffer. However, if the size we pass in is bigger than the maximum size of the buffer, we don't resize the buffer.

I've patched that as follows:-

--- a/module/rdpClientCon.c
+++ b/module/rdpClientCon.c
@@ -1748,7 +1748,8 @@ rdpClientConPreCheck(rdpPtr dev, rdpClientCon *clientCon, int in_size)
             rv = 1;
         }
         clientCon->count = 0;
-        init_stream(clientCon->out_s, 0);
+        // Make sure stream is now big enough for the requested size
+        init_stream(clientCon->out_s, in_size + 20);
         s_push_layer(clientCon->out_s, iso_hdr, 8);
     }

@jsorg71 - could you check my logic?

However, like you, I'm still seeing the X server hang up - I just don't get the message. So there seem to be two things going on here. I'll take another look Monday.

jsorg71 commented 1 week ago

rdpClientConPreCheck is to check if the new message can fit, if not, send what's there to make room. This buffer is not meant to grow. Maybe we can change that but I'd rather not. I think it's 32KB. I wonder what is causing the size to get big. I see there are 787 crects, that is ok. I can't see how many drects, maybe that is it. I'll try to reproduce it.

matt335672 commented 1 week ago

Understood - thanks.

matt335672 commented 1 week ago

Here are some variables from a recent fail:-

Variable Value
start_frame_bytes 16
num_rects_d 2499
num_rects_c 1612
wiretosurface2_bytes 32921
end_frame_bytes 12
size 32961

There's a couple of observations here:- 1) num_rects_d + num_rects_c (4111) exceeds RDP_MAX_TILES (4096). Is this significant? 2) If num_rects_d + num_rects_c was 4096, size would still be 32841 which would exceed the buffer available.

RDP_MAX_TILES was last changed by fcc9ab21a6ce06987bd4e6ec0a89ebbe8fe193d4

matt335672 commented 1 week ago

I've made the following logging patch so I can see what the drects look like:-

--- a/module/rdpClientCon.c
+++ b/module/rdpClientCon.c
@@ -2684,6 +2684,12 @@ rdpClientConSendPaintRectShmFd(rdpPtr dev, rdpClientCon *clientCon,
         size += end_frame_bytes;        /* end frame message */
         size += 4;                      /* message 62 data_bytes */

+        if (size > (clientCon->out_s->size - 20))
+        {
+            // Way too big - pre-check will fail
+            LLOGLN(0, ("Size check prob: requested=%d max=%d", size, (clientCon->out_s->size - 20)));
+        }
+
         rdpClientConPreCheck(dev, clientCon, size);
         s = clientCon->out_s;
         out_uint16_le(s, 62);
@@ -2840,6 +2846,16 @@ rdpCapRect(rdpClientCon *clientCon, BoxPtr cap_rect, struct image_data *id)
     cap_dirty_save = rdpRegionCreate(NullBox, 0);
     rdpRegionCopy(cap_dirty_save, cap_dirty);
     num_rects = REGION_NUM_RECTS(cap_dirty);
+    if (num_rects > 2000)
+    {
+        BoxPtr rects_d = REGION_RECTS(cap_dirty);
+        int i;
+        LLOGLN(0, ("Dirty count = %d", num_rects));
+        for (i = 0; i < num_rects; ++i)
+        {
+            LLOGLN(0, ("dirty[%d] = (%d, %d, %d, %d)", i, rects_d[i].x1, rects_d[i].y1, rects_d[i].x2, rects_d[i].y2)); 
+        }
+    }
     if (num_rects > 0)
     {
         rects = 0;

Log file attached xorgxrdp.12.log

matt335672 commented 1 week ago

I can now see the problem, but I don't really understand the detail of this code. @jsorg71 - this really needs your expertise.

When xeyes is run, it's drawing a large number of small damage rectangles around the circular edge of the eyes. With a large monitor, these are considerable. When the copy rectangles are added, we get an overflow of the buffer. xeyes is a great test tool.

I can get xeyes working with this patch. It's NOT production quality:-

--- a/module/rdpClientCon.c
+++ b/module/rdpClientCon.c
@@ -2684,6 +2684,12 @@ rdpClientConSendPaintRectShmFd(rdpPtr dev, rdpClientCon *clientCon,
         size += end_frame_bytes;        /* end frame message */
         size += 4;                      /* message 62 data_bytes */

+        if (size > (clientCon->out_s->size - 20))
+        {
+            // Way too big - pre-check will fail
+            LLOGLN(0, ("Size check prob: requested=%d max=%d", size, (clientCon->out_s->size - 20)));
+        }
+
         rdpClientConPreCheck(dev, clientCon, size);
         s = clientCon->out_s;
         out_uint16_le(s, 62);
@@ -2836,10 +2842,17 @@ rdpCapRect(rdpClientCon *clientCon, BoxPtr cap_rect, struct image_data *id)
     LLOGLN(10, ("rdpCapRect: cap_rect x1 %d y1 %d x2 %d y2 %d",
                cap_rect->x1, cap_rect->y1, cap_rect->x2, cap_rect->y2));
     rdpRegionIntersect(cap_dirty, cap_dirty, clientCon->dirtyRegion);
+    num_rects = REGION_NUM_RECTS(cap_dirty);
+    if (num_rects > 2000) // What should this be??
+    {
+        // Collapse down to the bounding rectangle
+        rdpRegionUnionRect(cap_dirty, rdpRegionExtents(cap_dirty));
+        num_rects = REGION_NUM_RECTS(cap_dirty);
+    }
+
     /* make a copy of cap_dirty because it may get altered */
     cap_dirty_save = rdpRegionCreate(NullBox, 0);
     rdpRegionCopy(cap_dirty_save, cap_dirty);
-    num_rects = REGION_NUM_RECTS(cap_dirty);
     if (num_rects > 0)
     {
         rects = 0;

This also stops the X server crashing.

jsorg71 commented 1 week ago

We can't do much with crects but drect, yea, we can use rdpRegionExtents to simplify it. It's probably easier on the system to encode more pixels than have this complex region. There is a define for this, MAX_CAPTURE_RECTS that we lost with GFX(my fault) that should be put back. There is also another change I want to make in rdpDeferredUpdateCallback where it deals with clientCon->dev->monitorCount < 1 and uses MAX_CAPTURE_PIXELS with bands. Since the RFX encoder can handle chunking out large updates, we don't need that. In fact, this code path won't work with H264. I can create a PR.

jsorg71 commented 1 week ago

I create #319 for this

jsorg71 commented 1 week ago

It looks like I removed the use of MAX_CAPTURE_RECTS in 83e434e

metalefty commented 6 days ago

Reopen until fix in v0.10.

metalefty commented 5 days ago

Merged into v0.10 branch, I will release v0.10.2 in July or mid-August.