Closed agclark27 closed 5 years ago
Those segfaults sound like memory allocation/deallocation issues.
What is the server environment (distro, kernel, libc, glib) ?
Can you try running Janus with libasan
?
We're running on Amazon Linux 2 (Linux 4.14.77-81.59.amzn2.x86_64 x86_64), which is similar to CentOS 7. Libc is 2.26 (2.26-28.amzn2.0.1) and glib is 2.54 (2.54.2-2.amzn2). Compiling from master for libnice, libsrtp, and usrtscp, but it does seem to be malloc/dealloc at a higher level.
I just recompiled with libasan and will post a pastebin once we can get it to segfault again.
Just for reference, I'm leaving here a almost identical crash reported on the group. OS is the same (CentOS).
Any update with libasan?
PS: the traces seem to mention JSON stuff, so you may want to make sure libjansson is up to date as well.
We've been running with libasan for the past couple of weeks and haven't yet experienced another crash. We'll try to see if we can get it to overload tomorrow and produce a core dump. We've also updated to the latest master and bumped from 2.10 to 2.11 of libjansson per your suggestion. I'll let you know if we can get a core file with more detail.
I'm assuming it hasn't crashed in the overload test either? Can we close this?
Closing as I assume it's fixed now. Please let us know if it's still an issue.
We've been trying to identify the source of some segfaults that are occurring most frequently when using the videoroom and textroom plugins for a larger number of participants. Sometimes the segfaults will occur at 150 participants, and other times it might not occur until past 500. Sometimes they occur for smaller groups. We've tried to increase the RAM available and increase the CPU available, but in the examples cited, the CPU never went past 15% and there was always ample memory available. We're running the Janus process as a service with ulimit values like LimitNOFILE=1048576 and LimitNPROC=infinity.
In these segfaults, we were running 0.5.0 as of the 65c36f5 commit on 2018-11-15, but these have been occurring for us for some time. All 3 of these segfaults occurred today, 2018-11-20.
Here is the output from gdb for the 3 core files: https://pastebin.com/5asSEFXT