neutrinolabs / xrdp

xrdp: an open source RDP server
http://www.xrdp.org/
Apache License 2.0
5.62k stars 1.73k forks source link

Segmentation fault when writing lots of glyphs with RemoteFX codec #2068

Closed RolKau closed 6 months ago

RolKau commented 2 years ago

xrdp version: 0.9.17 (git sha 5808832) xorgxrdp version: 0.2.17 (git sha b943f0e) Client: FreeRDP 2.2.0+dfsg1-0ubuntu0.20.04.2

First I compile xrdp with:

./configure --enable-fuse --enable-jpeg --enable-pixman --enable-devel-debug --enable-devel-streamcheck

I connect to the server with these options:

xfreerdp /v:hostname /sec:rdp /gfx:rfx /rfx

This crashes immediately upon connect, with backtrace (from coredumpctl debug):

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f838fe5f859 in __GI_abort () at abort.c:79
#2  0x00007f8390162b67 in parser_stream_overflow_check (s=0x7ffcae525800, n=2, is_out=1, file=0x7f839014528c "libxrdp.c", line=1678) at parse.c:52
#3  0x00007f83900fe8b9 in libxrdp_fastpath_send_surface (session=0x560ca6dda1e0, data_pad=0x7f8384000b60 "", pad_bytes=256, data_bytes=101759, destLeft=0, destTop=0, destRight=3840, destBottom=2160, bpp=32, codecID=3, width=3840, height=2160) at libxrdp.c:1678
#4  0x0000560ca5f4990a in xrdp_mm_process_enc_done (self=0x560ca6ddf7c0) at xrdp_mm.c:2805
#5  0x0000560ca5f49bba in xrdp_mm_check_wait_objs (self=0x560ca6ddf7c0) at xrdp_mm.c:2903
#6  0x0000560ca5f55c7a in xrdp_wm_check_wait_objs (self=0x560ca6e13a70) at xrdp_wm.c:2210
#7  0x0000560ca5f4f394 in xrdp_process_main_loop (self=0x560ca6dda090) at xrdp_process.c:287
#8  0x0000560ca5f3bf88 in xrdp_process_run (in_val=0x0) at xrdp_listen.c:152
#9  0x0000560ca5f3d3f8 in xrdp_listen_fork (self=0x560ca6dd44f0, server_trans=0x560ca6dd8380) at xrdp_listen.c:802
#10 0x0000560ca5f3d871 in xrdp_listen_main_loop (self=0x560ca6dd44f0) at xrdp_listen.c:954
#11 0x0000560ca5f31cd0 in main (argc=2, argv=0x7ffcae526408) at xrdp.c:705

and the line in /var/log/xrdp.log:

libxrdp.c:1678 Stream output buffer overflow. Size=0, pos=7, requested=2

If I however now recompile again without the range check:

./configure --enable-fuse --enable-jpeg --enable-pixman --enable-devel-debug

I can log on, even with more caching options enabled:

xfreerdp /v:hostname /sec:rdp /gfx:rfx /rfx /codec-cache:rfx +bitmap-cache +offscreen-cache +glyph-cache +clipboard /f

I can now start Firefox and browse a bunch of webpages without any problem. But, if I put two terminal windows sized to 1920x2160 side-by-side and run the command od -c /dev/random in each of them, the server coredumps rather quickly with this backtrace:

#0  0x0000561d8ac3f5e6 in loop1f ()
#1  0x0000561d8ac3f7f4 in rfxcodec_encode_dwt_shift_amd64_sse41 ()
#2  0x0000561d0f324928 in ?? ()
#3  0x00007fcb7af9b300 in ?? ()
#4  0x0000561d8c214460 in ?? ()
#5  0x0000561d8c216460 in ?? ()
#6  0x0000561d8c210410 in ?? ()
#7  0x0000561d8ac3fb27 in rfx_encode_component_rlgr3_amd64_sse41 (enc=0x561d8c210410, qtable=<optimized out>, data=<optimized out>, buffer=0x7fcb2460aad9 "", buffer_size=-39721, size=0x7fcb2bcb38cc) at rfxencode_tile_amd64.c:99
#8  0x0000561d8ac3972e in rfx_encode_yuv (enc=enc@entry=0x561d8c210410, yuv_data=0x7fcb7af9b300 <error: Cannot access memory at address 0x7fcb7af9b300>, width=<optimized out>, height=<optimized out>, stride_bytes=stride_bytes@entry=15360, y_quants=<optimized out>, u_quants=0x561d471df50f <error: Cannot access memory at address 0x561d471df50f>, v_quants=0x561d8dbf62b7 "\025\300\215&05\222\340d_Ix\006\253\b\200\300GA\264~Q\001dV\377\345\374\302GĤ{\260 \006*\001\346\222\374\377&\002\241\376\f\037ܸ\207|", data_out=0x7fcb2bcb3980, y_size=0x7fcb2bcb38cc, u_size=0x7fcb2bcb38d0, v_size=0x7fcb2bcb38d4) at rfxencode_tile.c:221
#9  0x0000561d8ac39343 in rfx_compose_message_tile_yuv (yIdx=<optimized out>, xIdx=<optimized out>, quantIdxCr=<optimized out>, quantIdxCb=<optimized out>, quantIdxY=<optimized out>, quantVals=0x561d8ac44e9c <g_rfx_default_quantization_values> "ffw\210\230"<error: Cannot access memory at address 0x561d8ac44ea1>, stride_bytes=15360, tile_height=<optimized out>, tile_width=<optimized out>, tile_data=<optimized out>, s=0x7fcb2bcb3980, enc=0x561d8c210410) at rfxencode_compose.c:232
#10 rfx_compose_message_tileset (width=-1966846308, height=2176, flags=0, num_quants=0, quants=0x0, num_tiles=1394, tiles=0x7fcb24600fb0, stride_bytes=15360, buf=0x7fcb294d4000 "", s=0x7fcb2bcb3980, enc=0x561d8c210410) at rfxencode_compose.c:488
#11 rfx_compose_message_data (enc=enc@entry=0x561d8c210410, s=s@entry=0x7fcb2bcb3980, regions=<optimized out>, num_regions=num_regions@entry=12, buf=buf@entry=0x7fcb294d4000 "", width=width@entry=3840, height=2176, stride_bytes=15360, tiles=0x7fcb24600fb0, num_tiles=1394, quants=0x0, num_quants=0, flags=0) at rfxencode_compose.c:589
#12 0x0000561d8ac385ba in rfxcodec_encode_ex (handle=0x561d8c210410, cdata=<optimized out>, cdata_bytes=0x7fcb2bcb3a8c, buf=0x7fcb294d4000 "", width=3840, height=2176, stride_bytes=15360, regions=0x7fcb2460a828, num_regions=12, tiles=0x7fcb24600fb0, num_tiles=1394, quants=0x0, num_quants=0, flags=0) at rfxencode.c:333
#13 0x0000561d8ac3865f in rfxcodec_encode (handle=<optimized out>, cdata=<optimized out>, cdata_bytes=<optimized out>, buf=<optimized out>, width=<optimized out>, height=<optimized out>, stride_bytes=15360, regions=0x7fcb2460a828, num_regions=12, tiles=0x7fcb24600fb0, num_tiles=1394, quants=0x0, num_quants=0) at rfxencode.c:352
#14 0x0000561d8ac1eccf in process_enc_rfx (self=0x561d8c20b870, enc=0x561d8c1cf640) at xrdp_encoder.c:384
#15 0x0000561d8ac1effa in proc_enc_msg (arg=0x561d8c20b870) at xrdp_encoder.c:507
#16 0x00007fcb2ef76609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#17 0x00007fcb2f0b2293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The buffer_size parameter in line 7 looks suspicious. Unfortunately, if I build with CFLAGS="-O0", the backtrace gets cut off after item 6 instead of getting more information for some unknown reason.

It seems to me that this bug is triggered whenever a lot of text is written as characters to the display; programs that draws the text themselves as graphics don't elicit this behaviour.

matt335672 commented 2 years ago

@RolKau

I'm unable to reproduce this using the commands above and this RDP client:-

$ xfreerdp --version
This is FreeRDP version 2.2.0 (n/a)

I don't get a chance to close the client window myself incidentally - the command completes and the session exits pretty swiftly. Am I missing something there?

Also, what version of xorgxrdp are you using when the crash happens? This might also have a bearing on the problem

RolKau commented 2 years ago

So I'd think this is either invalid at this point, or it gets written over while it's sitting in the fifo

I don't think that the enc->data array itself becomes invalid, because it is accessed in earlier iterations in the loop, and I don't see how it could be freed there (unless there is another thread doing it). I suspect that it is an overrun, but I don't know where the size of the array is determined.

what version of xorgxrdp are you using when the crash happens?

I was using 0.2.17, but you may want to schedule another release of that one as well: If I upgrade to the devel branch I am unable to reproduce the problem over at least ten attempts.

The only real change seems to have been the full screen refresh; maybe this straighten out some internal state or changes some timings.

matt335672 commented 2 years ago

@RolKau - please accept my apologies for the auto-close. That was not my intention. We'll be working on the v0.9.18 release for a bit. I'd like to pick this up again afterwards.

Just as a quick note for you - enc->data could possibly become invalid.

There's a memory area shared between xorgxrdp and the xrdp process. xorgxrdp writes to the memory area and then tells xrdp what the changes are, rather than sending the data directly.

Along with the change info, xorgxrdp sends the ID of the shared memory area (see shmop(2)). The ID is normally stable, but there are situations like a resize where the ID changes. If this happens, xrdp detaches from the old area and attaches to the new area. The code for this is in the xrdp xup module xrdp/xup.c.

I'm not at all clear whether enc->data points to the shared memory area, but a bit of logging should resolve this. Also, this is only a supposition on my part. I'm still learning about this area, and I only found the shared memory area when investigating a FreeBSD issue recently. Since there are threads involved here it just seems like a possibility to me.

I hope this is useful information.

RolKau commented 2 years ago

please accept my apologies for the auto-close

On the contrary, I regarded myself this issue as solved. I reckon that the second trap I discovered is really another problem, and in particular we probably need a better repro before we can make progress on it.

There's a memory area shared between xorgxrdp and the xrdp process

Just a thought: Who closes this memory area, and how is the other process notified about this? Can it be that the xorgxrdp process close the area upon shutdown, but that the xrdp process is not quite done writing all out of the bitmaps, so it disappears halfway down the loop?

This would make it somewhat timing-dependent, which would explain why we have problems reproducing it in other than my particular setup (I am on a rather crappy DOCSIS 3 residential broadband). And also why it only happens on shutdown.

matt335672 commented 2 years ago

The lifecycle of the shared area is controlled by xorgxrdp as xorgxrdp's lifetime is the session lifetime, and xrdp may have a shorter lifetime.

xorgxrdp sends over an ID, and if the ID has changed, xrdp disconnects from the old area and connects to the new one. However, I can't find any sync between the threads in this area, so it could be related to this.

On the other hand, I may have this completely wrong. I'll get some logging in when I can get to this, and at least determine whether enc->data is pointing to this area or not.

I'd like to leave this open for now. If we find a better way to reproduce we can open another fault, but I'd like to do at least a bit more exploration on this one.

matt335672 commented 6 months ago

I'm closing this one now, as we've completely reworked the shared memory support for the GFX merge (will ship in v0.10). That affects the RFX codepaths too.