Open metalefty opened 2 months ago
I haven't tested but the fix will be like this. Maybe the same change should applied in other places.
@matt335672 What do you think?
diff --git a/common/os_calls.c b/common/os_calls.c
index afd5a958..1952d425 100644
--- a/common/os_calls.c
+++ b/common/os_calls.c
@@ -1567,7 +1567,7 @@ g_sck_can_send(int sck, int millis)
pollfd.revents = 0;
if (poll(&pollfd, 1, millis) > 0)
{
- if ((pollfd.revents & POLLOUT) != 0)
+ if ((pollfd.revents & (POLLOUT | POLLHUP | POLLERR)) != 0)
{
rv = 1;
}
diff --git a/common/ssl_calls.c b/common/ssl_calls.c
index 70d2d7c8..63636770 100644
--- a/common/ssl_calls.c
+++ b/common/ssl_calls.c
@@ -1362,11 +1362,11 @@ ssl_tls_write(struct ssl_tls *tls, const char *data, int length)
* SSL_ERROR_WANT_WRITE
*/
case SSL_ERROR_WANT_READ:
- g_sck_can_recv(tls->trans->sck, SSL_WANT_READ_WRITE_TIMEOUT);
- continue;
+ break_flag = g_sck_can_recv(tls->trans->sck, SSL_WANT_READ_WRITE_TIMEOUT);
+ break;
case SSL_ERROR_WANT_WRITE:
- g_sck_can_send(tls->trans->sck, SSL_WANT_READ_WRITE_TIMEOUT);
- continue;
+ break_flag = g_sck_can_send(tls->trans->sck, SSL_WANT_READ_WRITE_TIMEOUT);
+ break;
/* socket closed */
case SSL_ERROR_ZERO_RETURN:
I've had a look at the Linux/FreeBSD issue.
The language used in both is more complicated than it needs to be. The POSIX manpage says:-
POLLERR An error has occurred on the device or stream. This flag is only valid in the revents bitmask; it shall be ignored in the events member. POLLHUP A device has been disconnected, or a pipe or FIFO has been closed by the last process that had it open for writing. Once set, the hangup state of a FIFO shall persist until some process opens the FIFO for writing or until all read-only file descriptors for the FIFO are closed. This event and POLLOUT are mutually-exclusive; a stream can never be writable if a hangup has occurred. However, this event and POLLIN, POLLRDNORM, POLLRDBAND, or POLLPRI are not mutually-exclusive. This flag is only valid in the revents bitmask; it shall be ignored in the events member.
I think these are all saying the same thing, which is that we don't need to set POLLERR
or POLLHUP
but they are checked (and returned) automatically.
As for the existing code, this looks OK to me. The poll()
will return regardless of which flag is set. In ssl_calls.c
this means we'll reattempt the write, and then we'll get the error which we can log. That's where these two errors are coming from:-
[2024-08-28T17:48:35.369-0400] [ERROR] SSL_read: I/O error
...
[2024-08-28T17:49:23.596-0400] [ERROR] SSL_write: Failure in SSL library (protocol error?)
So I don't think this needs changing.
What does need changing is that something isn't checking the status of a read or write. It's not obvious looking at the code where this would be.
A couple of questions:- 1) Can the user reproduce this? It looks like it's resize-related 2) Can the user take debug builds from us?
Thanks.
I think I saw something like this once during GFX testing.
My guess is we've sent the client something on the GFX channel that it doesn't like. The client is closing the GFX channel (but not the main channel), and then we're logging errors for everything else we send on that channel. We do do a lot of error checking in those code paths.
If that's right, we need to figure out what the client is objecting to.
@matt335672
Thanks for the info. I've been busy for a while. I haven't asked the end-user if the issue is GFX related but they've taken a stack trace.
The stack trace taken whilst the xrdp process is still running shows
#0 0x00007f87bae11a35 in write () from /lib64/libpthread.so.0
#1 0x00007f87bb5766a0 in g_file_write () from /usr/lib64/xrdp/libcommon.so.0
#2 0x00007f87bb57409a in internal_log_message () from /usr/lib64/xrdp/libcommon.so.0
#3 0x00007f87bb57445a in log_message () from /usr/lib64/xrdp/libcommon.so.0
#4 0x00007f87bb54bafb in xrdp_sec_send () from /usr/lib64/xrdp/libxrdp.so.0
#5 0x00007f87bb5486d6 in xrdp_rdp_send_data_from_channel () from /usr/lib64/xrdp/libxrdp.so.0
#6 0x00007f87bb52fa03 in libxrdp_send_bitmap () from /usr/lib64/xrdp/libxrdp.so.0
#7 0x00005645f41b1384 in xrdp_painter_end_update ()
#8 0x00005645f419fab7 in xrdp_bitmap_invalidate.part ()
#9 0x00005645f41a0f91 in xrdp_bitmap_invalidate ()
#10 0x00005645f41b968e in xrdp_wm_init ()
#11 0x00005645f41b9eb5 in xrdp_wm_check_wait_objs ()
#12 0x00005645f41b3724 in xrdp_process_main_loop ()
#13 0x00005645f41a6cfb in xrdp_listen_main_loop ()
#14 0x00005645f419e29c in main ()
strace output shows evidence that the socket is disconnected (POLLHUP
):
poll([{fd=14, events=POLLOUT}], 1, 0) = 1 ([{fd=14, revents=POLLOUT|POLLERR|POLLHUP}])
That's useful - thanks.
That's the code that sends GFX updates when the backend isn't running in GFX mode.
My suspicion is one of these lines. Neither has error checking on it:-
If you're stuck for time at the moment, let me know if you'd like a patch.
@matt335672 Could you work on patch? I'm still stuck for time for a while.
Will do. Should get to it Monday at the latest.
I think I can (sort of) reproduce this now with this patch:-
--- a/xrdp/xrdp_wm.c
+++ b/xrdp/xrdp_wm.c
@@ -833,6 +833,13 @@ xrdp_wm_init(struct xrdp_wm *self)
{
LOG(LOG_LEVEL_DEBUG, " xrdp_wm_init: no autologin / auto run detected, draw login window");
xrdp_login_wnd_create(self);
+static int first_time = 1;
+if (first_time)
+{
+ LOG(LOG_LEVEL_INFO, "Sleeping....");
+ g_sleep(10 * 1000);
+ LOG(LOG_LEVEL_INFO, "Done sleeping....");
+}
/* clear screen */
xrdp_bitmap_invalidate(self->screen, 0);
xrdp_wm_set_focused(self, self->login_window);
I then connect and kill the connection before the sleep finishes.
I don't get an infinite loop, but I get a lot of logging along these lines:-
[2024-09-16T11:19:53.833+0100] [ERROR] [xrdp_iso_send(xrdp_iso.c:639)] xrdp_iso_send: trans_write_copy_s failed
[2024-09-16T11:19:53.833+0100] [ERROR] [xrdp_mcs_send(xrdp_mcs.c:1475)] xrdp_mcs_send: xrdp_iso_send failed
[2024-09-16T11:19:53.833+0100] [ERROR] [xrdp_sec_send(xrdp_sec.c:1328)] xrdp_sec_send: xrdp_mcs_send failed
[2024-09-16T11:19:53.833+0100] [ERROR] [xrdp_rdp_send_data_from_channel(xrdp_rdp.c:680)] xrdp_rdp_send_data_from_channel: xrdp_sec_send failed
Despite my earlier comments, it's nothing to do with GFX - in fact in GFX mode things seem to terminate more quickly.
I'll step through this and figure out where the best place to detect the connection has failed is. Adding a check to each potential write is a lot of work and probably unnecessary. There's probably something that can be done at the top level.
After spending quite a lot of time looking into this, I think I'm chasing my own tail. I'm sure I'm missing something.
The very first comment contains this line :-
[2024-08-28T08:38:02.147-0400] [INFO ] Received memory_allocation_complete command. width: 7680, height: 4320
That has to be generated by xup.c
. In turn that means the client has authenticated and is trying to connect to a session.
The stacktrace is related to the login screen being drawn. I didn't think that was possible after we've authenticated and are trying to connect to a session.
This looks like two separate problems. To stop even more flailing about on my part, I'm going to ignore the stacktrace for now.
Going back to the logging above, we've got this message:-
[2024-08-28T17:48:35.370-0400] [ERROR] Sending [ITU T.125] DisconnectProviderUltimatum failed
This is generated here in xrdp_mcs_disconnect()
:-
Following the call stack up:-
1) xrdp_mcs_disconnect()
is called from one place only, in xrdp_sec_disconnect()
2) xrdp_sec_disconnect()
is called from one place only, in xrdp_rdp_disconnect()
3) xrdp_rdp_disconnect()
is called from one place only in libxrdp_disconnect()
4) libxrdp_disconnect()
is called from two places. These are both in xrdp_process_main_loop()
, and could be replaced with a single call.
After libxrdp_disconnect()
returns, I can't see how a separate xrdp process would fail to exit.
@metalefty - can you check a couple of things with the user:-
1) Are they running with fork=true
in xrdp.ini?
2) Can you ask them to set EnableProcessId=true
in the [Logging]
section? That should let us tie up the logging with the actual xrdp processes being used. I'd like to know if these messages are coming from one xrdp process, or many xrdp processes.
Are they running with fork=true in xrdp.ini?
Yes.
Can you ask them to set EnableProcessId=true in the [Logging] section? That should let us tie up the logging with the actual xrdp processes being used. I'd like to know if these messages are coming from one xrdp process, or many xrdp processes.
Yes, trying to reproduce with EnableProcessId=true
now.
An enterprise user reports that they're seeing a problem where xrdp 0.10.1 is stuck in a logging loop.
I haven't reproduced the issue I guess there are 2 issues.
POLLERR
/POLLHUP
is not checkedg_sck_can_send/recv()
is not checkedRegarding issue 1,
POLLERR
/POLLHUP
is always checked on FreeBSD but not on GNU/Linux.Linux
https://man7.org/linux/man-pages/man2/poll.2.html
FreeBSD
https://man.freebsd.org/cgi/man.cgi?query=poll&apropos=0&sektion=0&manpath=FreeBSD+14.1-RELEASE+and+Ports&arch=default&format=html