grobian / carbon-c-relay

Enhanced C implementation of Carbon relay, aggregator and rewriter
Apache License 2.0
380 stars 107 forks source link

Segmentation fault when supervisorctl start ccrelay #448

Closed pbaranovsky closed 2 years ago

pbaranovsky commented 2 years ago

Installed latest version of ccrelay. relay -v carbon-c-relay v3.7.4 (d22cec-dirty) enabled support for: gzip ssl regular expressions library: PCRE

running on debian buster cat /etc/debian_version 10.12

when supervisorctl start ccrelay : Segmentation fault

in /var/log/messages kernel: [2756009.582926] relay[2969]: segfault at 0 ip 00007fe44ef1f11e sp 00007ffd79c87bf8 error 4 in libc-2.28.so[7fe44edea000+147000]

I'm attempting to run this command: /opt/ccrelay/bin/relay -f /opt/ccrelay/etc/ccrelay.cfg [-S 30 -b 50000 -w 18 -q 15000000 -p -H -d [2022-08-16 16:18:10] starting carbon-c-relay v3.7.4 (d22cec-dirty), pid=4523 configuration: relay hostname = workers = 18 send batch size = 50000 server queue size = 15000000 server max stalls = 4 listen backlog = 32 server connection IO timeout = 600ms idle connections disconnect timeout = 10m debug = true configuration = /opt/ccrelay/etc/ccrelay.cfg

Would appreciate any pointers on how to look into this further, or what the issue might be.

grobian commented 2 years ago

Is this running in a container or some env that is memory constrained?

If you could get a backtrace somehow, then that would help. I'm affraid for that you would have to build from source if there are no debugsymbols available.

grobian commented 2 years ago

To be precise, you never see parsed configuration follows: in the output, right? That restricts it somewhat. Does it also crash if you use a very minimal config or default options (basically -f conf)?

pbaranovsky commented 2 years ago

no @grobian this is running on a physical hardware. I've tried the latest version and also 3.4. Both I've compiled myself from source. There seems to be plentiful memory available on the server. free -m total used free shared buff/cache available Mem: 128709 1855 122334 1009 4519 124902 Swap: 7812 0 7812

Configuration does get parsed: relay -f /path/to/ccrelay.cfg [-b <> -w <> -q <> -p -H [date] starting carbon-c-relay v3.7.4 (d22cec-dirty), pid=37304 configuration: relay hostname = workers = send batch size = <..> server queue size = <> server max stalls = <> listen backlog = <> server connection IO timeout = 600ms idle connections disconnect timeout = 10m configuration = /opt/ccrelay/etc/ccrelay.cfg

Segmentation fault

Minimal configuration (just the -f) fails with segmentation fault as well

grobian commented 2 years ago

Then if you could, please ./configure && make and run the config with gdb --args ./relay, type run after the (gdb) prompt and bt when it pops up for the segmentation fault. Paste the stack you get here, if possible. Thanks!

pbaranovsky commented 2 years ago

gdb --args ./relay -f /opt/ccrelay/etc/ccrelay.cfg GNU gdb (Debian 8.2.1-2+b3) 8.2.1 ... Reading symbols from ./relay...done. (gdb) run Starting program: ...relay -f /opt/ccrelay/etc/ccrelay.cfg [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [2022-08-16 19:59:38] starting carbon-c-relay v3.7.4 (d22cec-dirty), pid=39581 configuration: relay hostname = workers = 40 send batch size = 2500 server queue size = 25000 server max stalls = 4 listen backlog = 32 server connection IO timeout = 600ms idle connections disconnect timeout = 10m configuration = /opt/ccrelay/etc/ccrelay.cfg

Program received signal SIGSEGV, Segmentation fault. __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:102 <<<<<<< 102 ../sysdeps/x86_64/multiarch/strcmp-avx2.S: No such file or directory. <<<<<<<<< (gdb) bt

0 __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:102

1 0x000055555556e70f in server_cmp (s=, saddr=saddr@entry=0x5555555a1280, ip=ip@entry=0x0)

at server.c:1271

2 0x0000555555565e7d in router_add_server (ret=ret@entry=0x5555555a31a0, ip=0x0, port=2003, inst=0x0,

type=T_LINEMODE, transport=W_PLAIN, mtlspemcert=0x0, mtlspemkey=0x0, proto=CON_TCP, saddrs=0x5555555a1280,
hint=0x0, useall=0 '\000', cl=0x7ffff7800060) at router.c:630

3 0x00005555555617c2 in router_yyparse (yyscanner=0x5555555a4960, rtr=rtr@entry=0x5555555a31a0,

ralloc=0x5555555a2580, palloc=palloc@entry=0x55555559f360) at conffile.y:244

4 0x000055555556a2af in router_readconfig (orig=0x0, path=0x7fffffffe841 "/opt/ccrelay/etc/ccrelay.cfg",

workercnt=<optimized out>, queuesize=<optimized out>, batchsize=<optimized out>, maxstalls=<optimized out>,
iotimeout=600, sockbufsize=0, listenport=<port>) at router.c:1340

5 0x000055555555937c in main (argc=3, argv=) at relay.c:885

(gdb)

pbaranovsky commented 2 years ago

@grobian this was due to a malformed line in ccrelay.cfg as the https://github.com/grobian/carbon-c-relay/issues/4 0x000055555556a2af in router_readconfig (orig=0x0, path=0x7fffffffe841 "/opt/ccrelay/etc/ccrelay.cfg", suggests.

pbaranovsky commented 2 years ago

...and thank you very much for suggesting how to use gdb to narrow down the source!.

grobian commented 2 years ago

Can you share your clusters from your config? The crash should be fixed, but I'm trying to see what you're doing :)

pbaranovsky commented 2 years ago

one of the variables in ccrelay.cfg was not properly resolved cluster cluster_name any_of

:2003 ;
grobian commented 2 years ago

Aha, nice! And you wanted this to mean ADDR_ANY or something?

pbaranovsky commented 2 years ago

there was supposed to be a variable containing fqdn's of nodes inserted there.