rsyslog / librelp

OFFICIAL librelp repository on github
https://www.rsyslog.com/librelp/
GNU General Public License v3.0
30 stars 35 forks source link

Segmentation fault in gnutls_record_get_direction () libgnutls #240

Closed StrongestNumber9 closed 1 year ago

StrongestNumber9 commented 2 years ago

We are getting segmentation faults related to relp handling

Program terminated with signal 11, Segmentation fault.
#0  0x00007fca33189a10 in gnutls_record_get_direction () from /usr/lib64/libgnutls.so.28

Backtrace from the dump

(gdb) bt
#0  0x00007fca33189a10 in gnutls_record_get_direction () from /usr/lib64/libgnutls.so.28
#1  0x00007fca33480f4c in relpTcpGetRtryDirection_gtls (pThis=<optimized out>) at tcp.c:3668
#2  relpTcpGetRtryDirection (pThis=<optimized out>) at tcp.c:3758
#3  0x00007fca334790bc in engineEventLoopRun (pThis=pThis@entry=0x562866564000) at relp.c:820
#4  0x00007fca33479a27 in relpEngineRun (pThis=0x562866564000) at relp.c:1029
#5  0x00007fca2f1f36e8 in runInput (pThrd=0x562866571360) at imrelp.c:868
#6  0x0000562864654d62 in thrdStarter (arg=0x562866571360) at ../threads.c:243
#7  0x00007fca33a9cea5 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007fca31d54b0d in clone () from /usr/lib64/libc.so.6

We are getting this many times per day and we are not using tls here. We have coredumps available if needed. We also see messages from https://github.com/rsyslog/librelp/blob/v1.10.0/src/relpframe.c#L189-L195 before segfaulting. We use 256k maxMessageSize on rsyslog configuration.

rsyslogd: imrelp: librelp error 'frame too long, size 262394, configured max 262144 -frame will be truncated, but session continues', object  'librelp' - input may not work as intended [v8.2102.0 try https://www.rsyslog.com/e/2291 ]
rsyslogd: imrelp: librelp error 'frame too long, size 262394, configured max 262144 -frame will be truncated, but session continues', object  'librelp' - input may not work as intended [v8.2102.0 try https://www.rsyslog.com/e/2291 ]
rsyslog version: 8.2102
librelp version: v1.10.0
OS: CentOS Linux release 7.9.2009 (Core)
rgerhards commented 2 years ago

You are not using TLS? That's more than strange. Can you send me the config and a debug log at least covering the complete startup?

StrongestNumber9 commented 2 years ago

TLS is not used in imrelp or omrelp but it is enabled for imtcp. Added rsyslog.conf and debug log for startup in the support ticket

rgerhards commented 2 years ago

Would it be possible that your run rsyslog under valgrind control and provide us the resulting valgrind exceptions? If we have bad luck, the problem does not occur. But if it does, valgrind often provides very useful information.

rgerhards commented 2 years ago

Well, I looked once again at the code that I am suspecting to be not OK. Question: did you compile librelp with or without TLS?

rgerhards commented 2 years ago

@StrongestNumber9 I have crafted a draft PR (#241), but I am not really convinced it will address your issue. If you have the time, it would be good if you could give it a try.

StrongestNumber9 commented 2 years ago

Well, I looked once again at the code that I am suspecting to be not OK. Question: did you compile librelp with or without TLS?

We didn't set any other options explicitly except --prefix=/our/path and --disable-static

14:57:03  [INFO] librelp will be compiled with the following settings:
14:57:03  [INFO] 
14:57:03  [INFO] run valgrind in testbench:       yes
14:57:03  [INFO] Debug mode enabled:              no
14:57:03  [INFO] GNUTLS enabled:                  yes
14:57:03  [INFO] GNUTLS authentication supported: yes
14:57:03  [INFO] OPENSSL enabled:                 yes
14:57:03  [INFO] generic TLS tests enabled:       yes
rgerhards commented 2 years ago
14:57:03  [INFO] Debug mode enabled:              no
14:57:03  [INFO] GNUTLS enabled:                  yes
14:57:03  [INFO] GNUTLS authentication supported: yes
14:57:03  [INFO] OPENSSL enabled:                 yes

Looks like enabled. I admit that I am puzzled by the issue you see. Could you still try out the patch I posted? Frankly, I don't think it cures your problem, but knowing that would really help (at least me).

StrongestNumber9 commented 2 years ago

We compiled rsyslog with --enable-openssl --disable-gnutls in addition to librelp with --disable-tls --enable-tls-openssl settings and that seems to fix the problem or at least allows us to stay afloat. Haven't been able to test the patch yet as this is kinda do-or-die situation at clients production so minimal disruptions right now

rgerhards commented 2 years ago

@StrongestNumber9 Thx for the feedback and glad to hear you have a work-around. Given that, I plan to look more in-depth next week together with @alorbach. I am pretty sure that there is a bug, but I do not clearly see the bug area. My guess is that a full analysis takes me to the beginning of next week as well, so there is not much benefit in doing it right now, quite frankly said,