Closed myelsukov closed 4 years ago
Correction: 1.20-2 fails too. Downgrade to 1.20-1 fixed the problem.
This is the most common use case, which suggests you must be running on a rather obscure configuration (otherwise it'd have failed for many others). Also, there is only one 1.20 release, so I'm tempted to ask what distribution are you running on and what's the difference between 1.20-1 and 1.20-2?
All that being said:
gdb
:
$ gdb ./a.out
(gdb) run
<segfault happens here>
(gdb) backtrace
<offending code is shown here>
(more info here https://stackoverflow.com/questions/2876357/determine-the-line-of-code-that-causes-a-segmentation-fault if curious).
I have the exact same problem on a fresh archlinuxarm install: sslh-fork coredumps on ssl connections, but ssh connections work fine. It's not just @myelsukov
Ok, I still need a back trace :-)
OK I'll try to do that. Not sure if this is useful, but here's the journalctl error:
Process 42258 (sslh) of user 63671 dumped core.
Stack trace of thread 42258:
#0 0x0000aaaadcbc80d0 n/a (sslh-fork + 0xb0d0)
#1 0x0000aaaadcbc733c n/a (sslh-fork + 0xa33c)
#2 0x0000aaaadcbc7b10 n/a (sslh-fork + 0xab10)
#3 0x0000aaaadcbc1e98 n/a (sslh-fork + 0x4e98)
#4 0x0000aaaadcbc21f8 n/a (sslh-fork + 0x51f8)
#5 0x0000aaaadcbc16c8 n/a (sslh-fork + 0x46c8)
#6 0x0000ffff992f9d90 __libc_start_main (libc.so.6 + 0x23d90)
#7 0x0000aaaadcbc19bc n/a (sslh-fork + 0x49bc)
#8 0x0000aaaadcbc19bc n/a (sslh-fork + 0x49bc)
Hmmm it looks like the binary was stripped. It needs to be compiled with -g which is the distribution default. Can you try with a freshly cloned head from GitHub?
As I mentioned in the ticket I am running vanilla Arch Linux 5.4.51-1-ARCH on ARMV7 (Raspberry Pi)
I don't know if it is important or not but I am running sslh
as transparent proxy.
All versions prior to 1.20 worked like a clockwork for years so it is something that was introduced recently.
I will try to find time to experiment with a locally built sslh
if my grandkids allow that :)
Well, that was easy :grin:
PID: 8660 (sslh-fork)
UID: 979 (sslh)
GID: 979 (sslh)
Signal: 11 (SEGV)
Timestamp: Sat 2020-07-25 10:22:21 PDT (19s ago)
Command Line: /usr/bin/sslh-fork --config /etc/sslh.cfg --foreground
Executable: /usr/bin/sslh-fork
Control Group: /system.slice/sslh-fork.service
Unit: sslh-fork.service
Slice: system.slice
Boot ID: eea67b257c5b43938cda7da51ecfb96f
Machine ID: aa8463431c294aa9b55ffb978fc87d69
Hostname: pi
Storage: /var/lib/systemd/coredump/core.sslh-fork.979.eea67b257c5b43938cda7da51ecfb96f.8660.1595697741000000000000.lz4
Message: Process 8660 (sslh-fork) of user 979 dumped core.
Stack trace of thread 8660:
#0 0x0000000000486a44 parse_tls_header (sslh-fork + 0xaa44)
#1 0x0000000000485cf0 is_tls_protocol (sslh-fork + 0x9cf0)
#2 0x00000000004863a0 probe_client_protocol (sslh-fork + 0xa3a0)
#3 0x000000000047e8f8 start_shoveler (sslh-fork + 0x28f8)
#4 0x000000000047ee9c main_loop (sslh-fork + 0x2e9c)
#5 0x00000000004854fc main (sslh-fork + 0x94fc)
#6 0x0000000076e09bb8 __libc_start_main (libc.so.6 + 0x1abb8)
This is odd, that's all code that hasn't changed in a while. It should help a lot if that's deterministic.
Could you run with --verbose 2 and report the trace? Also, do you know if your target is big endian or little endian?
It is 100% reproducible. Unfortunately adding --verbose 2
to the command line did not produce any additional output.
The CPU is running in the little endian mode (according to uname -a
it is armv7l
Version 1.19c which is running perfectly well produces this output:
Jul 25 14:21:44 pi sslh-fork[12329]: accepted fd 4
Jul 25 14:21:44 pi sslh-fork[12375]: **** writing deferred on fd -1
Jul 25 14:21:44 pi sslh-fork[12375]: probing for ssh
Jul 25 14:21:44 pi sslh-fork[12375]: probing for ssl
Jul 25 14:21:44 pi sslh-fork[12375]: connecting to xxxxxxxxxxxx.lan:https family 2 len 16
Jul 25 14:21:44 pi sslh-fork[12375]: ssl:connection from xxxxxxx.lan:54172 to myself:https forwarded from xxxxxxx.lan:54172 to xxxxxxxxxxxx.lan:https
Jul 25 14:21:44 pi sslh-fork[12375]: flushing deferred data to fd 3
I like reproducible:-)
It was mentioned earlier two versions of 1.20, one of which worked: could you try 1.20 from GitHub, and 1.21b from GitHub as well? (Trying to reduce to the change that breaks it).
The function parse_tls_header doesn't look wrong at a first glance, but it's not totally trivial either. And it does a lot of pointer arithmetics...
v1.20 works fine:
Jul 26 09:30:39 pi sslh-fork[15126]: Using /etc/sslh.cfg
Jul 26 09:30:39 pi sslh-fork[15126]: ssh addr: myself:ethernet-ip-s. libwrap service: (null) log_level: 1 family 2 2 [] []
Jul 26 09:30:39 pi sslh-fork[15126]: tls addr: xxxxxxxxxxxx.lan:https. libwrap service: (null) log_level: 1 family 2 2 [] []
Jul 26 09:30:39 pi sslh-fork[15126]: listening on:
Jul 26 09:30:39 pi sslh-fork[15126]: 0.0.0.0:https []
Jul 26 09:30:39 pi sslh-fork[15126]: timeout: 2
Jul 26 09:30:39 pi sslh-fork[15126]: on-timeout: ssh
Jul 26 09:30:39 pi sslh-fork[15126]: listening to 1 addresses
Jul 26 09:30:39 pi sslh-fork[15126]: sslh-fork v1.20 started
Jul 26 09:30:43 pi sslh-fork[15135]: accepted fd 4
Jul 26 09:30:43 pi sslh-fork[15135]: accepted fd 4
Jul 26 09:30:43 pi sslh-fork[15169]: **** writing deferred on fd -1
Jul 26 09:30:43 pi sslh-fork[15169]: probing for ssh: PROBE_NEXT
Jul 26 09:30:43 pi sslh-fork[15169]: probing for tls: PROBE_MATCH
Jul 26 09:30:43 pi sslh-fork[15169]: connecting to xxxxxxxxxxxx.lan:https family 2 len 16
Jul 26 09:30:43 pi sslh-fork[15169]: tls:connection from xxxxxxx.lan:60913 to myself:https forwarded from xxxxxxx.lan:60913 to xxxxxxxxxxxx.lan:https
Jul 26 09:30:43 pi sslh-fork[15169]: flushing deferred data to fd 3
Jul 26 09:30:45 pi sslh-fork[15168]: timed out, connect to ssh
Jul 26 09:30:45 pi sslh-fork[15168]: connecting to myself:ethernet-ip-s family 2 len 16
Jul 26 09:30:45 pi sslh-fork[15168]: ssh:connection from xxxxxxx.lan:60912 to myself:https forwarded from xxxxxxx.lan:60912 to myself:ethernet-ip-s
Jul 26 09:30:45 pi sslh-fork[15168]: flushing deferred data to fd 3
Jul 26 09:30:45 pi sslh-fork[15169]: client socket closed
Jul 26 09:30:45 pi sslh-fork[15169]: connection closed down
I just realised that my server is ARM as well -- so that's not it.
I can't see a major difference between 1.20 and 1.21 that would explain this. I'll dig a bit deeper.
Ok, valgrind (which checks validity of memory accesses) seems happy on my system.
Can you:
valgrind --leak-check=full ./sslh-fork -v 2 -f -F yourconfig.cfg
Running under valgrind does not fail. I am getting this trace:
sudo valgrind --leak-check=full ./sslh-fork -v 2 -f -F /etc/sslh.cfg
==20037== Memcheck, a memory error detector
==20037== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==20037== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==20037== Command: ./sslh-fork -v 2 -f -F /etc/sslh.cfg
==20037==
sslh-fork v1.21b-1-g2c93a01-dirty started
--20037-- WARNING: unhandled arm-linux syscall: 403
--20037-- You may be able to write your own handler.
--20037-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
--20037-- Nevertheless we consider this a bug. Please report
--20037-- it at http://valgrind.org/support/bug_reports.html.
==20040== Conditional jump or move depends on uninitialised value(s)
==20040== at 0x112A3C: parse_tls_header (tls.c:162)
==20040== by 0x111CEF: is_tls_protocol (probe.c:214)
==20040== by 0x11239F: probe_client_protocol (probe.c:366)
==20040== by 0x10A8F7: start_shoveler (sslh-fork.c:98)
==20040== by 0x10AE9B: main_loop (sslh-fork.c:200)
==20040== by 0x1114FB: main (sslh-main.c:322)
==20040==
ssl:connection from xxxxxxx.lan:49110 to myself:https forwarded from xxxxxxx.lan:49110 to xxxxxxxxxxxx.lan:https
And the build log is:
[14:45 pi@pi sslh (master *=) ] > make
Makefile:74: warning: ignoring prerequisites on suffix rule definition
./genver.sh >version.h
cc -Wall -g -O0 -DENABLE_REGEX -DLIBPCRE -DLIBCONFIG -c sslh-conf.c
cc -Wall -g -O0 -DENABLE_REGEX -DLIBPCRE -DLIBCONFIG -c common.c
cc -Wall -g -O0 -DENABLE_REGEX -DLIBPCRE -DLIBCONFIG -c sslh-main.c
cc -Wall -g -O0 -DENABLE_REGEX -DLIBPCRE -DLIBCONFIG -c probe.c
cc -Wall -g -O0 -DENABLE_REGEX -DLIBPCRE -DLIBCONFIG -c tls.c
cc -Wall -g -O0 -DENABLE_REGEX -DLIBPCRE -DLIBCONFIG -c argtable3.c
cc -Wall -g -O0 -DENABLE_REGEX -DLIBPCRE -DLIBCONFIG -c sslh-fork.c
cc -Wall -g -O0 -o sslh-fork sslh-fork.o sslh-conf.o common.o sslh-main.o probe.o tls.o argtable3.o -lpcreposix -lconfig
#strip sslh-fork
cc -Wall -g -O0 -DENABLE_REGEX -DLIBPCRE -DLIBCONFIG -c sslh-select.c
cc -Wall -g -O0 -o sslh-select sslh-select.o sslh-conf.o common.o sslh-main.o probe.o tls.o argtable3.o -lpcreposix -lconfig
#strip sslh-select
pod2man --section=8 --release=v1.21b-1-g2c93a01-dirty --center=" " sslh.pod | gzip -9 - > sslh.8.gz
cc -Wall -g -O0 -DENABLE_REGEX -DLIBPCRE -DLIBCONFIG -c echosrv.c
cc -Wall -g -O0 -o echosrv echosrv.o probe.o common.o tls.o -lpcreposix -lconfig
==20040== Conditional jump or move depends on uninitialised value(s)
==20040== at 0x112A3C: parse_tls_header (tls.c:162)
Well that's interesting. I can't think of a way tls_data would be uninitialised, and I can't think of a way that would segfault, but I don't see this behaviour on my system and dammit tls_data should be initialised. By code that has changed between v1.20 and v1.21. Can you post your config file? (feel free to e-mail it if you don't want to publish it, which I would understand)
Getting this myself and it's trivial to reproduce, so I will attempt to bisect it. ~Hah well no actually it isn't. I cannot reproduce it without running it in systemd.~
Here's my config
# Default Arch configuration
# You can find more examples in /usr/share/doc/sslh
verbose: false;
foreground: true;
inetd: false;
numeric: true;
transparent: true;
timeout: 2
#pidfile: "/run/sslh.pid";
listen:
(
{ host: "a.b.c.d"; port: "443"; },
{ host: "10.8.0.1"; port: "443"; },
{ host: "10.9.0.1"; port: "443"; }
);
protocols:
(
{ name: "ssh"; service: "ssh"; host: "a.b.c.d"; port: "223"; probe: "builtin"; },
{ name: "openvpn"; host: "a.b.c.d"; port: "1194"; probe: "builtin"; },
{ name: "ssl"; host: "a.b.c.d"; port: "4443"; probe: "builtin"; },
);
# vim:set ts=4 sw=4 et:
The most interesting thing I do is use transparency for ssh logging. a.b.c.d
was just a sed of my real IP address.
~Edit: @yrutschle I tried to bisect but I'll need some more help beacuse I got a bogus commit, ed3cd40. Undoubtedly because I struggled to bisect with conf2struct.~
I just learned you can bisect by path and I got e7ce929
I'm getting a crash here:
Backtrace: https://0x0.st/ixcQ.txt
158 return TLS_EPROTOCOL;
159
160 /* By now we know it's TLS. if SNI or ALPN is set, parse extensions to see if
161 * they match. Otherwise, it's a match already */
162 if (tls_data &&
163 (tls_data->match_mode.tls_match_alpn || tls_data->match_mode.tls_match_sni)) { // <- LINE 163
164 return parse_extensions(tls_data, data + pos, len);
165 } else {
166 return TLS_MATCH;
167 }
It seems that tls_data
is just bogus.
Can you post your config file? (feel free to e-mail it if you don't want to publish it, which I would understand)
timeout: 2;
verbose: true;
foreground: true;
transparent: true;
listen:
(
{ host: "0.0.0.0"; port: "llll"; }
);
protocols:
(
{ name: "ssh"; host: "xxxxxx"; port: "xxxx"; probe: "builtin"; },
{ name: "ssl"; host: "xxxxxxxxxxxx"; port: "yyyy"; probe: "builtin"; }
);
For what it worth, I am also starting sslh as a systemd service.
Tls. Can you change 'ssl' to 'tls' as name for the protocol?
This fixes the issue for me, but why? I don't see any indication in the man page that ssl is deprecated, in fact it looks as if tls is just an alias for it. I see https://github.com/yrutschle/sslh/commit/9ed44ee09b97632139eaa54f8a7b0db7c1dcedd1 but I did not get any deprecation warning when starting sslh.
Switching to tls
fixed the problem for me too. Now, that's a mystery. I hope you can reproduce and debug on your side.
To summarize my setup:
ssl
instead of tls
transparent: true
Ah. Glad we got to the root cause :-)
So there is in fact a number of issues:
ssl
is in fact no longer supported as an option; there was a message for this in 1.20, but I didn't write is clearly enough in the Changelog... I'll fix that right away (The default, or example, sslh.cfg in Arch Linux may need to be updated accordingly)ssl
last results in the TLS probe being called with an unitialised pointer. I'm not quite sure how that segfaults in the end, but there is clearly a problem there. I'll see about fixing that quickly.ssl
, or any probe that doesn't exist really, should fire an error. I'm not sure whether this belongs in conf2struct
or in sslh
, I'll think about it.Thanks to everyone who contributed to this investigation. Too often we developers are left with a 'it doesn't work! I don't know what is make!'
Started by systemd didn't actually matter for me in the end. What did matter was transparent though. According to probe.c
, ssl
should still be valid. My bisect basically goes to the point where you switched everything to the config struct. I can't follow what's going on but it looks like ssl should be the exact same probe tls. Is there somewhere that assumptions breaks down?
No no: sslh-main.c:config_protocols() treats 'tls' specifically to initialise the protocol.data field. Plus, conf2struct didn't initialise so-called runtime data. So when set to 'ssl', you end up with protocol.data not initialised, so the test in tls.c:162 breaks (if (tls_data [...]
assumes tls_data is either zero or valid, but here it's just garbage).
I've checked in the various changes I intended:
ssl
so you actually get an error instead of crashingSwitching to "tls" also fixes the issue for me. Thanks!
@yrutschle ok that makes sense, thank you for the explanation, I can see by your latest commits how that could have happened, and how the commit I found was the result of the bisect.
I just tested the latest git master with both ssl
and tls
and I can confirm this has fixed the issue. ssl
is now does not work and produces an error, and tls
works as before.
sslh-fork coredumps on accepting ssl connection. ssh connections are coming through just fine.
After this it was still OK:
Got broken after this: