Closed azimot closed 1 year ago
The cause of crash is not clear since you did not share any backtrace from gdb or report from ASan.
hloop
are the threads that Janus uses internally to handle media packets and PeerConnection related events. They are not created by libnice but are needed to handle the event loop in libnice though.
For such a small number of participants (16), the number of hloop
should be way lower.
A living hloop
might indicate one of the followings:
hloop
is not being closed We definitely need further data:
1) Your current janus.jcfg
(to rule out a misconfiguration in event loops)
2) A gdb backtrace or ASan report (that will clarify the reason of the crash)
3) A Janus Admin API dump, in particular a list_sessions
request and for each session returned a list_handles
request, fetched before the crash (you can poll the API every X seconds, that is to understand if you are leaking resources on the server)
Make also sure you didn't forget to increase the ulimit: https://janus.conf.meetecho.com/docs/FAQ.html#ulimit
I was assuming you are using the VideoRoom in multistream mode, if that is not the case (e.g. you are using Janus 1.x with 0.x syntax), then that number of hloop
threads might not be significant, since in a VR with 16 participants you might end up having up to 16^2 = 256 active handles and hloop
threads.
In that case, as Lorenzo mentioned, you are probably hitting a kernel limit.
@atoppi @lminiero exactly Lorenzo, that's reason is related to ulimit !!
my way that solved problem :
ulimit -a
show us almost all user limits defined by default in Linux
cat /proc/$(pidof janus)/limits
show us limits of Janus to use
temporary increase limits to see problem is solved :
ulimit -n 60000
ulimit -Hn 600000
ulimit -Sn 60000
Note : ulimit -n
is not enough ulimit -Sn
and ulimit -Hn
are two switches that we must be used and I assumed if we want to have set them permanently please edit limits.conf
file :
nano /etc/security/limits.conf
#ulimit -Hn 1048576
root hard nofile 1048576
#ulimit -Sn 60000
root soft nofile 60000
root
is user that Janus running by. in most tutorials used * (star) to get all users but this is not working !! must be use exact user we want to set new configs.
after any reboot we can to see cat /proc/$(pidof janus)/limits
is correct
Lorenzo, is not this problem popular ? 15-16 participants is regular usage by VideoRoom plugin and these configs are needed in main Janus installation tutorial in website and README.md too
problem solved but I'm pretty confused !
that problem is solved by ulimit -Sn
but it's related to open files limit (disk I/O (read/write)) but hence I know about Janus architecture while we don't record streams on disk we should be not have huge open files .
why that happened ?!? and why Janus (without stream recording) needs to have more open files ??!
Thank You
OS: ubuntu 20.04 update cpu : 10 cores Xeon(R) CPU E7-4850 v4 @ 2.10GHz ram : 16GB
Janus version : 1.1.1 libnice version : 0.1.17 libsrtp version : 2.2.0 usrsctp version : 0.9.5.0 libwebsockets version : 4.3.2 used transport : websocket
plugins: { disable = "libjanus_voicemail.so,libjanus_recordplay.so,libjanus_echotest.so,libjanus_nosip.so,libjanus_streaming.so,libjanus_videocall.so,libjanus_textroom.so" }
transports: { disable = "libjanus_rabbitmq.so,libjanus_pfunix.so" }
I'm using Videoroom plugin with 1 publisher and 15 subscribers all things was fine while I had just 15 subscribers. after 16th joined Janus crashed and killed by OS I just check processes and there are many hloop processes! more than 325 threads
logs and crash report are exist :
top with apport
if you see, there are more that 320 threads on hloop before apport process eaten CPU at first I think apport was buggy and this is why linux killed low priority processes additionally Janus too so I disable apport and then run a test again
process log without apport process
I checked thread limitation on linux but it's OK and enough
_opt_janus_bin_janus.0.crash
Is hloop related to libnice ? must be used newer libnice version ?
As I see there is no zombie. in normal usage, linux killed Janus process , any idea to resolve that or similar experiences ?