kvic-z / pixelserv-tls

A tiny bespoke HTTP/1.1 server for adblock and accelerating web browsing.
GNU Lesser General Public License v3.0
203 stars 23 forks source link

segfault at 18 error 4 in libpthread-2.30.so #37

Closed emeidi closed 3 years ago

emeidi commented 4 years ago

Today I moved my pixelserv-tls instance 2.2.1 from a Debian server to another Debian server running (Linux MYSERVER 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux). Shortly after the switch syslog started to fill with segfaults like the one below, occuring every few minutes:

[ 8314.171749] pixelserv-tls[30124]: segfault at 18 ip 00007fb9b65840a0 sp 00007fb9b691be28 error 4 in libpthread-2.30.so[7fb9b657d000+f000]
[ 8314.171757] Code: 87 28 fe ff ff 4c 89 e0 48 d3 e0 a9 81 08 00 00 0f 84 17 fe ff ff e9 61 ff ff ff 8b 07 83 c8 02 83 f8 03 74 f6 e9 87 fd ff ff <8b> 57 18 64 8b 04 25 d0 02 00 00 39 c2 0f 84 7d 00 00 00 41 57 41

I first updated to 2.3.1 to make sure this bug wasn't already fixed in a newer version. But the crashes continued to happen.

I use monit to detect pixelserv-tls crashing, so it got restarted automatically everytime this happened. I set this up as a precaution because less mature versions of pixelserv-tls used to crash a lot, or even though the process was running, no requests were served anymore.

I then used strace to debug a running process right before the crash:

# strace -p 2566
strace: Process 2566 attached
select(8, [4 6 7], NULL, NULL, NULL)    = 1 (in [6])
accept(6, {sa_family=AF_INET, sin_port=htons(48042), sin_addr=inet_addr("10.1.2.3")}, [128->16]) = 9
fcntl(9, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(9, F_SETFL, O_RDWR)               = 0
setsockopt(9, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(9, SOL_SOCKET, SO_RCVTIMEO, "\0\0\0\0\0\0\0\0\360I\2\0\0\0\0\0", 16) = 0
getsockname(9, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("10.1.2.3")}, [128->16]) = 0
brk(0x55f90db97000)                     = 0x55f90db97000
read(9, "\26\3\1\1t", 5)                = 5
read(9, "\1\0\1p\3\3\4\374:\312\221D\0370F\300\3011\212\273\323\266=S\217\372g2\251\20gx"..., 372) = 372
stat("/usr/local/bin/pixelserv/certs/10.1.2.3", 0x7ffd3176c350) = -1 ENOENT (No such file or directory)
getpid()                                = 2566
sendto(3, "<28>Apr 12 15:34:28 pixelserv-tl"..., 68, MSG_NOSIGNAL, NULL, 0) = 68
openat(AT_FDCWD, "/tmp/pixelcerts", O_WRONLY) = 10
write(10, "10.1.2.3:", 10)             = 10
close(10)                               = 0
write(9, "\25\3\3\0\2\2P", 7)           = 7
getpid()                                = 2566
sendto(3, "<28>Apr 12 15:34:28 pixelserv-tl"..., 129, MSG_NOSIGNAL, NULL, 0) = 129
shutdown(9, SHUT_RDWR)                  = 0
close(9)                                = 0
select(8, [4 6 7], NULL, NULL, NULL)    = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=2614, si_uid=0} ---
rt_sigaction(SIGTERM, {sa_handler=SIG_IGN, sa_mask=[TERM], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f8396faa7e0}, {sa_handler=0x55f90d07c9b0, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f8397156110}, 8) = 0
madvise(0x55f90db77000, 77824, MADV_DONTNEED) = 0
brk(0x55f90db8d000)                     = 0x55f90db8d000
getpid()                                = 2566
sendto(3, "<26>Apr 12 15:34:28 pixelserv-tl"..., 401, MSG_NOSIGNAL, NULL, 0) = 401
openat(AT_FDCWD, "/usr/local/bin/pixelserv/certs/prefetch", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 9
fstat(9, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
write(9, "_.adnxs.com\t0\n_.appsflyer.com\t0\n"..., 322) = 322
close(9)                                = 0
+++ killed by SIGSEGV +++

I also increased the debug level to 5:

Apr 12 15:41:30 ELK pixelserv-tls[3075]: 10.1.2.3 10.1.2.3 missing
Apr 12 15:41:30 ELK pixelserv-tls[3075]: handshake failed: client 10.1.2.3:49404 server 10.1.2.3. Lib(20) Func(521) Reason(234)

Why is pixelserv-tls receiving requests from the same host? This is when it dawned to me that my monit HTTPS check running on he same machine actually might cause the issue:

check host pixelserv-tls with address 10.1.2.3
    start program = "/usr/local/bin/pixelserv/start.sh"
    stop program = "/usr/bin/killall pixelserv-tls"
    alert alert@domain.tld on {timeout,connection}
    if failed port 443 protocol https status 200 for 3 cycles then restart

(I am running monit 1:5.26.0-4).

Preliminary conclusion: The way monit's HTTPS requests are formed makes pixelserv-tls segfaulting.

I have now changed the monit configuration to the following, and no segfaults have happened so far:

check host pixelserv-tls with address 10.1.2.3
    start program = "/usr/local/bin/pixelserv/start.sh"
    stop program = "/usr/bin/killall pixelserv-tls"
    alert alert@domain.tld on {timeout,connection}
    if failed port 80 protocol http request "/servstats" with content == 'unknown reason' for 3 cycles then restart
    if failed port 443 protocol https with http headers [Host: www.any-known-sinkholed-domain.tld] status 200 for 3 cycles then restart
kvic-z commented 3 years ago

Go by your theory, if Monit's https requests causing the crash, you would be able to capture the details of the request with tcpdump.

pixelserv-tls with log level 2 or below is very robust from my tests. Once you increase log level to capture the full URL, the robustness decreases. It's because today's advertisers & trackers are simply crazy uploading huge size of info to their servers. It's not uncommon to see a few crashes per week.

I'm an Arch user. A little out of loop with Debian. If systemd is available, its 'restart always' is a very good keepalive feature.