meetecho / janus-gateway

Janus WebRTC Server
https://janus.conf.meetecho.com
GNU General Public License v3.0
8.23k stars 2.48k forks source link

Segfaults in 0.5.0 as of 65c36f5 #1442

Closed agclark27 closed 5 years ago

agclark27 commented 5 years ago

We've been trying to identify the source of some segfaults that are occurring most frequently when using the videoroom and textroom plugins for a larger number of participants. Sometimes the segfaults will occur at 150 participants, and other times it might not occur until past 500. Sometimes they occur for smaller groups. We've tried to increase the RAM available and increase the CPU available, but in the examples cited, the CPU never went past 15% and there was always ample memory available. We're running the Janus process as a service with ulimit values like LimitNOFILE=1048576 and LimitNPROC=infinity.

In these segfaults, we were running 0.5.0 as of the 65c36f5 commit on 2018-11-15, but these have been occurring for us for some time. All 3 of these segfaults occurred today, 2018-11-20.

Here is the output from gdb for the 3 core files: https://pastebin.com/5asSEFXT

atoppi commented 5 years ago

Those segfaults sound like memory allocation/deallocation issues. What is the server environment (distro, kernel, libc, glib) ? Can you try running Janus with libasan ?

agclark27 commented 5 years ago

We're running on Amazon Linux 2 (Linux 4.14.77-81.59.amzn2.x86_64 x86_64), which is similar to CentOS 7. Libc is 2.26 (2.26-28.amzn2.0.1) and glib is 2.54 (2.54.2-2.amzn2). Compiling from master for libnice, libsrtp, and usrtscp, but it does seem to be malloc/dealloc at a higher level.

I just recompiled with libasan and will post a pastebin once we can get it to segfault again.

atoppi commented 5 years ago

Just for reference, I'm leaving here a almost identical crash reported on the group. OS is the same (CentOS).

lminiero commented 5 years ago

Any update with libasan?

lminiero commented 5 years ago

PS: the traces seem to mention JSON stuff, so you may want to make sure libjansson is up to date as well.

agclark27 commented 5 years ago

We've been running with libasan for the past couple of weeks and haven't yet experienced another crash. We'll try to see if we can get it to overload tomorrow and produce a core dump. We've also updated to the latest master and bumped from 2.10 to 2.11 of libjansson per your suggestion. I'll let you know if we can get a core file with more detail.

lminiero commented 5 years ago

I'm assuming it hasn't crashed in the overload test either? Can we close this?

lminiero commented 5 years ago

Closing as I assume it's fixed now. Please let us know if it's still an issue.