Closed antonsmyk closed 1 year ago
Thanks for reporting. Investigating...
Bug filed internally.
This bug also seems to affect Splunk Universal Forwarder installations since splunk internally uses the same mechanism. Partial stacktrace of Splunk UF Version 9.0.5 attached including OEL Kernel & glibc version. Also prevents Splunk UF from starting.
glibc Version
glibc-2.17-326.0.5.el7_9.x86_64
glibc-common-2.17-326.0.5.el7_9.x86_64
Kernel Version
kernel-3.10.0-1160.92.1.0.1.el7.x86_64
StackTrace
[pid 9461] open("/sys/devices/system/cpu/online", O_RDONLY|O_CLOEXEC) = 3
> /usr/lib64/libc-2.17.so(open+0x10) [0xef900]
> /usr/lib64/libc-2.17.so(get_nprocs+0xbb) [0xfd01b]
> /usr/lib64/libc-2.17.so(__sysconf+0x53b) [0xc7d1b]
> /opt/splunkforwarder/lib/libjemalloc.so.2() [0x6299]
> /opt/splunkforwarder/lib/libjemalloc.so.2(malloc+0x4b4) [0x8f64]
> /usr/lib64/libc-2.17.so(__strdup+0x19) [0x8cb89]
> /usr/lib64/libpthread-2.17.so(__nptl_tunables_init+0x40) [0x6ec0]
> /usr/lib64/libpthread-2.17.so(__pthread_initialize_minimal+0x316) [0x6d76]
> /usr/lib64/libpthread-2.17.so(_init+0x8) [0x54f0]
> No DWARF information found
[pid 9461] read(3, "0-3\n", 8192) = 4
> /usr/lib64/libc-2.17.so(__read+0x10) [0xefb40]
> /usr/lib64/libc-2.17.so(next_line+0xaa) [0xfcd9a]
> /usr/lib64/libc-2.17.so(get_nprocs+0xe5) [0xfd045]
> /usr/lib64/libc-2.17.so(__sysconf+0x53b) [0xc7d1b]
> /opt/splunkforwarder/lib/libjemalloc.so.2() [0x6299]
> /opt/splunkforwarder/lib/libjemalloc.so.2(malloc+0x4b4) [0x8f64]
> /usr/lib64/libc-2.17.so(__strdup+0x19) [0x8cb89]
> /usr/lib64/libpthread-2.17.so(__nptl_tunables_init+0x40) [0x6ec0]
> /usr/lib64/libpthread-2.17.so(__pthread_initialize_minimal+0x316) [0x6d76]
> /usr/lib64/libpthread-2.17.so(_init+0x8) [0x54f0]
> No DWARF information found
[pid 9461] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x15} ---
[pid 9461] +++ killed by SIGSEGV +++
<... wait4 resumed>[{WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV}], 0, NULL) = 9461
LD_PRELOAD
-bash-4.2$ LD_PRELOAD=/opt/splunkforwarder/lib/libjemalloc.so /usr/bin/true
Segmentation fault
Seems to be related to what antonsmyk reported, same function calls in the StackTrace and also fails when preloading the splunkforwarder libjemalloc.so library.
Fix going through release process. Shouldn't be long now.
Hi again, @totalamateurhour! Thanks for promptly reviewing my report!
I've noticed already today that newer version of glibc-2.17-326.0.7.el7_9 is available on public Oracle Yum repo. And it appears to fix the reported problem with preloading jemalloc in this simple case:
# LD_PRELOAD=/usr/lib64/libjemalloc.so.1 /usr/bin/true
However, I see a problem in the source code: at line 91 there is free()
call to pointer allocated from stack using strdupa()
:
Indeed, this blows up when I have jemalloc
preloaded and GLIBC_TUNABLES
set:
# GLIBC_TUNABLES=glibc.malloc.trim_threshold=128:glibc.malloc.check=3 LD_PRELOAD=/usr/lib64/libjemalloc.so.1 /usr/bin/true
Segmentation fault (core dumped)
and stacktrace:
(gdb) bt
#0 0x00007f1546362c6c in je_extent_tree_ad_search () from /usr/lib64/libjemalloc.so.1
#1 0x00007f1546363f28 in je_huge_salloc () from /usr/lib64/libjemalloc.so.1
#2 0x00007f154634a5f5 in free () from /usr/lib64/libjemalloc.so.1
#3 0x00007f1545d62f6d in __nptl_tunables_init () from /lib64/libpthread.so.0
#4 0x00007f1545d62d27 in __pthread_initialize_minimal_internal () from /lib64/libpthread.so.0
#5 0x00007f1545d614b9 in _init () from /lib64/libpthread.so.0
#6 0x0000000000000000 in ?? ()
Modern version GCC warns on this free()
attempt, and ASan also catches the misuse:
# cat hello.c
// set _GNU_SOURCE for strdupa() avaliability
#define _GNU_SOURCE
#include <alloca.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void hello(const char *str)
{
char *p = strdupa(str);
puts(p);
free(p);
}
int main(void)
{
hello("Hello\n");
}
# /opt/rh/devtoolset-12/root/usr/bin/gcc -fsanitize=address hello.c -o hello
hello.c: In function 'hello':
hello.c:12:5: warning: 'free' called on pointer to an unallocated object [-Wfree-nonheap-object]
12 | free(p);
| ^~~~~~~
cc1: note: returned from '__builtin_alloca_with_align'
# ./hello
Hello
=================================================================
==135==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x7ffc0ddae5e0 in thread T0
#0 0x7fc7cc5c40a0 in __interceptor_free.part.0 (/lib64/libasan.so.8+0xbe0a0)
#1 0x401285 in hello (/hello+0x401285)
#2 0x4012a6 in main (/hello+0x4012a6)
#3 0x7fc7cc15a554 in __libc_start_main (/lib64/libc.so.6+0x22554)
#4 0x401118 (/hello+0x401118)
Address 0x7ffc0ddae5e0 is located in stack of thread T0
SUMMARY: AddressSanitizer: bad-free (/lib64/libasan.so.8+0xbe0a0) in __interceptor_free.part.0
==135==ABORTING
Thanks for the follow up, and apologies. This will be addressed shortly.
Thanks, @totalamateurhour! The new version seems fine. Closing the case.
We have noticed a problem with the new
glibc
package version 2.17-326.0.5.el7_9 recently published in Oracle Linux Yum repository: timestamp of the package is June 13, 2023 on the https://yum.oracle.com/repo/OracleLinux/OL7/latest/x86_64/index.html page.When
jemalloc.so
library is preloaded, then process crashes in glibc'sget_nprocs
function. This is reproducible withjemalloc
version 3.6.0-1.el7 available fromol7_developer_EPEL
repository, as well with custom build of jemalloc version 4.2.0 we use in our environment.This is easy to reproduce with a basic Oracle Linux 7 container image.
First prepare a recent Oracle Linux 7 container image with latest updates and jemalloc installed:
Then run a shell in the container and execute any command with jemalloc library preloaded: it would immediately crash:
Stacktrace of the crash:
When the glibc package is downgraded to _latest known working version 2.17-326.0.3.el79, the crash is not reproducible anymore.
We believe it has to do this recent update introducing call to
__nptl_tunables_init
innptl/nptl-init.c
source file. This change is mentioned inglibc.spec
file from the source RPM provided by Oracle at https://oss.oracle.com/ol7/SRPMS-updates/ :There is no such problem with Red Hat Enterprise Linux 7, and CentOS 7. Oracle Linux 8 does not seem to be affected either.
We believe its preloading should not lead to crash in glibc library. We do not want to stop using jemalloc library.
It seems also that other users of Linux Oracle 7 are going to hit the same problem as soon as they apply the latest updates.