Closed metalefty closed 5 days ago
@matt335672 @Nexarian @jsorg71 Can you reproduce this and have a look at this?
I've just tried to reproduce and I've not managed it so far.
I've got a breakpoint set on rdpClientCon.c:500 where the log message is generated. It's not firing.
xrdp commit : d6fce2f17355ffb3c783ffb2a31c418b3e930b26 xorgxrdp commit : 0e13e5d84e14c2488136da2f2259b41f32a627d2 (tag: v0.10.1)
In startwm.sh I've just got exec xterm -geometry 80x40
I'm definitely negotiating GFX. Client is Windows 10.
I'm probably missing something. Can you share a suitable .ini
file?
.ini
has quite simple diff.
An important point, "Use all monitors" and multimon is required. (I can reproduce with a single monitor though).
This is likely to require at least a 4K screen size.
diff --git a/sesman/sesman.ini.in b/sesman/sesman.ini.in
index 685af28b..7606f5d0 100644
--- a/sesman/sesman.ini.in
+++ b/sesman/sesman.ini.in
@@ -140,7 +140,7 @@ EnableSyslog=true
; CentOS 8 : param=/usr/libexec/Xorg
; FreeBSD (from 2022Q4) : param=/usr/local/libexec/Xorg
;
-param=Xorg
+param=/usr/libexec/Xorg
; Leave the rest parameters as-is unless you understand what will happen.
param=-config
param=xrdp/xorg.conf
Here is another Xorg log.
[185866.422] rdpClientConSendMsg: overrun error len, 36725 stream size 32868, client count 2
[185869.812] (EE)
[185869.812] (EE) Backtrace:
[185869.812] (EE) 0: /usr/libexec/Xorg (xorg_backtrace+0x7d) [0x55c58cf3de7d]
[185869.812] (EE) 1: /usr/libexec/Xorg (0x55c58cd83000+0x1bec3d) [0x55c58cf41c3d]
[185869.812] (EE) 2: /lib64/libpthread.so.0 (0x7f2cd6ba8000+0x12d10) [0x7f2cd6bbad10]
[185869.812] (EE) 3: /usr/libexec/Xorg (dixLookupProperty+0x50) [0x55c58cdfed10]
[185869.812] (EE) 4: /usr/libexec/Xorg (0x55c58cd83000+0x7c89e) [0x55c58cdff89e]
[185869.812] (EE) 5: /usr/libexec/Xorg (0x55c58cd83000+0x5c5de) [0x55c58cddf5de]
[185869.812] (EE) 6: /usr/libexec/Xorg (0x55c58cd83000+0x60686) [0x55c58cde3686]
[185869.812] (EE) 7: /lib64/libc.so.6 (__libc_start_main+0xe5) [0x7f2cd681dd85]
[185869.812] (EE) 8: /usr/libexec/Xorg (_start+0x2e) [0x55c58cdcd02e]
[185869.812] (EE)
[185869.812] (EE) Segmentation fault at address 0x0
[185869.812] (EE)
Fatal server error:
[185869.812] (EE) Caught signal 11 (Segmentation fault). Server aborting
Thanks. I'll have another go. I can probably set up a VM with two 4K monitors.
I'm still not able to reproduce this with a Windows 10 client and two 4K monitors.
In some ways I'm surprised this is happening, as with xterm
as a WM and running xterm
as an application, everything is in black-and-white. I'd have though that would compress really well.
Could you post your xrdp.log
and full Xorg.log
for a crash? I can see you're not running with two identical monitors but I can't see the exact geometry. That might have something to do with it. Also, with a full xrdp.log
I can check I'm negotiating the same codec.
I've tried with and without --enable-glamor
on xorgxrdp.
@matt335672 Ah, I made a serious typo. xterm
at step 3 is actually xeyes
. They are completely different software and, naturally, you couldn't reproduce. Could you try again? If you are still not able to reproduce, I will collect log and post.
Thanks for the update!
Following your new instructions I've reproduced this on a single 4K monitor. Stack trace is as follows:-
(gdb) where
#0 rdpClientConSendMsg (dev=0x56e76232f6a0, clientCon=0x56e76264e370) at rdpClientCon.c:500
#1 0x00007117547e16e9 in rdpClientConSendPending (dev=0x56e76232f6a0, clientCon=0x56e76264e370) at rdpClientCon.c:533
#2 0x00007117547e7951 in rdpClientConSendPaintRectShmFd (dev=0x56e76232f6a0, clientCon=0x56e76264e370, id=0x7ffd77ef4780, dirtyReg=0x56e762911ee0, copyRects=0x56e762946620,
numCopyRects=787) at rdpClientCon.c:2735
#3 0x00007117547e7f26 in rdpCapRect (clientCon=0x56e76264e370, cap_rect=0x7ffd77ef4770, id=0x7ffd77ef4780) at rdpClientCon.c:2857
#4 0x00007117547e8230 in rdpDeferredUpdateCallback (timer=0x56e76245be10, now=2554056, arg=0x56e76264e370) at rdpClientCon.c:2953
#5 0x000056e7610c7fc0 in ?? ()
#6 0x000056e7610c8248 in WaitForSomething ()
#7 0x000056e760f583e7 in ?? ()
#8 0x000056e760f5c6b4 in ?? ()
#9 0x0000711754e29d90 in __libc_start_call_main (main=main@entry=0x56e760f455d0, argc=argc@entry=11, argv=argv@entry=0x7ffd77ef4a18) at ../sysdeps/nptl/libc_start_call_main.h:58
#10 0x0000711754e29e40 in __libc_start_main_impl (main=0x56e760f455d0, argc=11, argv=0x7ffd77ef4a18, init=<optimised out>, fini=<optimised out>, rtld_fini=<optimised out>,
stack_end=0x7ffd77ef4a08) at ../csu/libc-start.c:392
#11 0x000056e760f45605 in _start ()
I've worked out where the 'overrun' message is coming from. We call rdpClientConPreCheck()
before sending a frame to make sure there's space in the buffer. However, if the size we pass in is bigger than the maximum size of the buffer, we don't resize the buffer.
I've patched that as follows:-
--- a/module/rdpClientCon.c
+++ b/module/rdpClientCon.c
@@ -1748,7 +1748,8 @@ rdpClientConPreCheck(rdpPtr dev, rdpClientCon *clientCon, int in_size)
rv = 1;
}
clientCon->count = 0;
- init_stream(clientCon->out_s, 0);
+ // Make sure stream is now big enough for the requested size
+ init_stream(clientCon->out_s, in_size + 20);
s_push_layer(clientCon->out_s, iso_hdr, 8);
}
@jsorg71 - could you check my logic?
However, like you, I'm still seeing the X server hang up - I just don't get the message. So there seem to be two things going on here. I'll take another look Monday.
rdpClientConPreCheck is to check if the new message can fit, if not, send what's there to make room. This buffer is not meant to grow. Maybe we can change that but I'd rather not. I think it's 32KB. I wonder what is causing the size to get big. I see there are 787 crects, that is ok. I can't see how many drects, maybe that is it. I'll try to reproduce it.
Understood - thanks.
Here are some variables from a recent fail:-
Variable | Value |
---|---|
start_frame_bytes | 16 |
num_rects_d | 2499 |
num_rects_c | 1612 |
wiretosurface2_bytes | 32921 |
end_frame_bytes | 12 |
size | 32961 |
There's a couple of observations here:-
1) num_rects_d + num_rects_c
(4111) exceeds RDP_MAX_TILES
(4096). Is this significant?
2) If num_rects_d + num_rects_c
was 4096, size
would still be 32841 which would exceed the buffer available.
RDP_MAX_TILES
was last changed by fcc9ab21a6ce06987bd4e6ec0a89ebbe8fe193d4
I've made the following logging patch so I can see what the drects look like:-
--- a/module/rdpClientCon.c
+++ b/module/rdpClientCon.c
@@ -2684,6 +2684,12 @@ rdpClientConSendPaintRectShmFd(rdpPtr dev, rdpClientCon *clientCon,
size += end_frame_bytes; /* end frame message */
size += 4; /* message 62 data_bytes */
+ if (size > (clientCon->out_s->size - 20))
+ {
+ // Way too big - pre-check will fail
+ LLOGLN(0, ("Size check prob: requested=%d max=%d", size, (clientCon->out_s->size - 20)));
+ }
+
rdpClientConPreCheck(dev, clientCon, size);
s = clientCon->out_s;
out_uint16_le(s, 62);
@@ -2840,6 +2846,16 @@ rdpCapRect(rdpClientCon *clientCon, BoxPtr cap_rect, struct image_data *id)
cap_dirty_save = rdpRegionCreate(NullBox, 0);
rdpRegionCopy(cap_dirty_save, cap_dirty);
num_rects = REGION_NUM_RECTS(cap_dirty);
+ if (num_rects > 2000)
+ {
+ BoxPtr rects_d = REGION_RECTS(cap_dirty);
+ int i;
+ LLOGLN(0, ("Dirty count = %d", num_rects));
+ for (i = 0; i < num_rects; ++i)
+ {
+ LLOGLN(0, ("dirty[%d] = (%d, %d, %d, %d)", i, rects_d[i].x1, rects_d[i].y1, rects_d[i].x2, rects_d[i].y2));
+ }
+ }
if (num_rects > 0)
{
rects = 0;
Log file attached xorgxrdp.12.log
I can now see the problem, but I don't really understand the detail of this code. @jsorg71 - this really needs your expertise.
When xeyes
is run, it's drawing a large number of small damage rectangles around the circular edge of the eyes. With a large monitor, these are considerable. When the copy rectangles are added, we get an overflow of the buffer. xeyes
is a great test tool.
I can get xeyes
working with this patch. It's NOT production quality:-
--- a/module/rdpClientCon.c
+++ b/module/rdpClientCon.c
@@ -2684,6 +2684,12 @@ rdpClientConSendPaintRectShmFd(rdpPtr dev, rdpClientCon *clientCon,
size += end_frame_bytes; /* end frame message */
size += 4; /* message 62 data_bytes */
+ if (size > (clientCon->out_s->size - 20))
+ {
+ // Way too big - pre-check will fail
+ LLOGLN(0, ("Size check prob: requested=%d max=%d", size, (clientCon->out_s->size - 20)));
+ }
+
rdpClientConPreCheck(dev, clientCon, size);
s = clientCon->out_s;
out_uint16_le(s, 62);
@@ -2836,10 +2842,17 @@ rdpCapRect(rdpClientCon *clientCon, BoxPtr cap_rect, struct image_data *id)
LLOGLN(10, ("rdpCapRect: cap_rect x1 %d y1 %d x2 %d y2 %d",
cap_rect->x1, cap_rect->y1, cap_rect->x2, cap_rect->y2));
rdpRegionIntersect(cap_dirty, cap_dirty, clientCon->dirtyRegion);
+ num_rects = REGION_NUM_RECTS(cap_dirty);
+ if (num_rects > 2000) // What should this be??
+ {
+ // Collapse down to the bounding rectangle
+ rdpRegionUnionRect(cap_dirty, rdpRegionExtents(cap_dirty));
+ num_rects = REGION_NUM_RECTS(cap_dirty);
+ }
+
/* make a copy of cap_dirty because it may get altered */
cap_dirty_save = rdpRegionCreate(NullBox, 0);
rdpRegionCopy(cap_dirty_save, cap_dirty);
- num_rects = REGION_NUM_RECTS(cap_dirty);
if (num_rects > 0)
{
rects = 0;
This also stops the X server crashing.
We can't do much with crects but drect, yea, we can use rdpRegionExtents to simplify it. It's probably easier on the system to encode more pixels than have this complex region. There is a define for this, MAX_CAPTURE_RECTS that we lost with GFX(my fault) that should be put back. There is also another change I want to make in rdpDeferredUpdateCallback where it deals with clientCon->dev->monitorCount < 1 and uses MAX_CAPTURE_PIXELS with bands. Since the RFX encoder can handle chunking out large updates, we don't need that. In fact, this code path won't work with H264. I can create a PR.
I create #319 for this
It looks like I removed the use of MAX_CAPTURE_RECTS in 83e434e
Reopen until fix in v0.10.
Merged into v0.10 branch, I will release v0.10.2 in July or mid-August.
Version: xrdp v0.10.0 and xorgxrdp v0.10.1
Steps to reproduce:
xterm
as WM for simplicityxeyes -geometry 3840x2160
(exactly the same screen size with xrandr reports)I see the following xorgxrdp log. This hang-up doesn't happen v0.9. So it is related to recent GFX change.
I recalled #64 from the log with the following log line. It looks like a buffer overrun. I also recalled #5 and #62.
It looks like a buffer overrun, I tried to increase buffer size to 64k. I don't see any obvious buffer overrun with 64k buffer but Xorg still hangs up.