Closed Eltiech closed 10 years ago
Hi,
thanks for the info! Are you using a recent version of Janus? Something has been fixed in the meanwhile for what concerns the segfaults in those scenarios.
About Firefox, it may be crashing more often as I believe it still doesn't support the bundling of media, which means there are more connections involved instead of just one. I'll try and replicate the issue ASAP.
Yes, for the logs above, I cloned janus about 8-9 hours ago.
Right, so, as I'd hoped, I was able to get some logs. My C and gdb skills are rather weak at the moment, but hopefully these logs+backtraces will be of some use to you.
I've noticed that the crashes generally tend to come when clients disconnect.
[40847866] Got an RTCP packet (bundled stream)!
[3617442126] Got an RTCP packet (bundled stream)!
[643882682] Got an RTCP packet (bundled stream)!
[552522943] Got an RTCP packet (bundled stream)!
[1856993979] Got an RTCP packet (bundled stream)!
Cleaning up session 2754766975...
Destroying session 2754766975
Detaching handle from JANUS VideoRoom plugin
Removing Video Room session...
No WebRTC media anymore
[983312870] ICE thread ended!
Notifying participant 1146988291 (MalFirefox)
[4118614720] Adding event to queue of messages...
>> 0 (Success)
Notifying participant 2162449123 (Q0)
>> -2 (Unknown error)
Notifying participant 1442471551 (Mal3)
[926862253] Adding event to queue of messages...
>> 0 (Success)
Notifying participant 2529362087 (Rav3nShad0w)
[1575043553] Adding event to queue of messages...
>> 0 (Success)
Notifying participant 3499307538 (Thrawn089)
[668603940] Adding event to queue of messages...
>> 0 (Success)
Notifying participant 1966074859 (SalientBlue)
[706725718] Adding event to queue of messages...
>> 0 (Success)
Notifying participant 3847423536 (Mal2)
[3536523017] Adding event to queue of messages...
>> 0 (Success)
Notifying participant 332554258 (Mal)
[1028227297] Adding event to queue of messages...
>> 0 (Success)
Notifying participant 3712919776 (cyzon)
[2564630410] Adding event to queue of messages...
>> 0 (Success)
Notifying participant 2409153343 (Wr3nch)
[466390592] Adding event to queue of messages...
>> 0 (Success)
Notifying participant 1146988291 (MalFirefox)
[4118614720] Adding event to queue of messages...
>> 0 (Success)
Notifying participant 1442471551 (Mal3)
[926862253] Adding event to queue of messages...
>> 0 (Success)
Notifying participant 2529362087 (Rav3nShad0w)
[1575043553] Adding event to queue of messages...
>> 0 (Success)
Notifying participant 3499307538 (Thrawn089)
[668603940] Adding event to queue of messages...
>> 0 (Success)
Notifying participant 1966074859 (SalientBlue)
[706725718] Adding event to queue of messages...
>> 0 (Success)
Notifying participant 3847423536 (Mal2)
[3536523017] Adding event to queue of messages...
>> 0 (Success)
Notifying participant 332554258 (Mal)
[1028227297] Adding event to queue of messages...
>> 0 (Success)
Notifying participant 3712919776 (cyzon)
[2564630410] Adding event to queue of messages...
>> 0 (Success)
Notifying participant 2409153343 (Wr3nch)
[466390592] Adding event to queue of messages...
>> 0 (Success)
[983312870] Adding event to queue of messages...
Handle detached (0), scheduling destruction
Detaching handle from JANUS VideoRoom plugin
Removing Video Room session...
No WebRTC media anymore
[3317979229] Adding event to queue of messages...
[3317979229] ICE thread ended!
Handle detached (0), scheduling destruction
Detaching handle from JANUS VideoRoom plugin
Removing Video Room session...
No WebRTC media anymore
[275679096] Adding event to queue of messages...
Handle detached (0), scheduling destruction
Detaching handle from JANUS VideoRoom plugin
[275679096] ICE thread ended!
[3973081491] ICE thread ended!
Removing Video Room session...
No WebRTC media anymore
[3973081491] Adding event to queue of messages...
Handle detached (0), scheduling destruction
Detaching handle from JANUS VideoRoom plugin
Removing Video Room session...
No WebRTC media anymore
[429650655] ICE thread ended!
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd17fa700 (LWP 35902)]
0x00007ffff5717814 in pthread_mutex_lock () from /usr/lib/libpthread.so.0
(gdb) bt
#0 0x00007ffff5717814 in pthread_mutex_lock () from /usr/lib/libpthread.so.0
#1 0x00007fffe83b8056 in janus_videoroom_hangup_media () from /usr/lib/janus/plugins/libjanus_videoroom.so
#2 0x00007fffe83b4c0b in janus_videoroom_destroy_session () from /usr/lib/janus/plugins/libjanus_videoroom.so
#3 0x0000000000411ed0 in janus_ice_handle_destroy ()
#4 0x0000000000418ae1 in janus_session_destroy ()
#5 0x00000000004182c8 in ?? ()
#6 0x00007ffff768c633 in ?? () from /usr/lib/libglib-2.0.so.0
#7 0x00007ffff768bb8d in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#8 0x00007ffff768bf68 in ?? () from /usr/lib/libglib-2.0.so.0
#9 0x00007ffff768c292 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#10 0x000000000041870d in ?? ()
#11 0x00007ffff76b2765 in ?? () from /usr/lib/libglib-2.0.so.0
#12 0x00007ffff5715314 in start_thread () from /usr/lib/libpthread.so.0
#13 0x00007ffff54533ed in clone () from /usr/lib/libc.so.6
(gdb) quit
A debugging session is active.
We have a message to serve...
{
"janus": "event",
"session_id": 1498650027,
"sender": 1070953053,
"plugindata": {
"plugin": "janus.plugin.videoroom",
"data": {
"videoroom": "event",
"room": 1234,
"unpublished": 2029048014
}
}
}
Request completed, freeing data
[Thread 0x7ffec47f8700 (LWP 41715) exited]
We have a message to serve...
{
"janus": "event",
"session_id": 3581880845,
"sender": 136133471,
"plugindata": {
"plugin": "janus.plugin.videoroom",
"data": {
"videoroom": "event",
"room": 1234,
"unpublished": 2029048014
}
}
}
Request completed, freeing data
[1013620768] Got an RTCP packet (bundled stream)!
[Thread 0x7ffef67fc700 (LWP 41645) exited]
[Thread 0x7ffec67fc700 (LWP 41670) exited]
We have a message to serve...
{
"janus": "event",
"session_id": 1001015136,
"sender": 2981913240,
"plugindata": {
"plugin": "janus.plugin.videoroom",
"data": {
"videoroom": "event",
"room": 1234,
"unpublished": 2029048014
}
}
}
Request completed, freeing data
[Thread 0x7ffec57fa700 (LWP 41671) exited]
We have a message to serve...
{
"janus": "event",
"session_id": 1532486552,
"sender": 4089587650,
"plugindata": {
"plugin": "janus.plugin.videoroom",
"data": {
"videoroom": "event",
"room": 1234,
"unpublished": 2029048014
}
}
}
Request completed, freeing data
[Thread 0x7ffec6ffd700 (LWP 41656) exited]
We have a message to serve...
{
"janus": "event",
"session_id": 2356093117,
"sender": 2213586239,
"plugindata": {
"plugin": "janus.plugin.videoroom",
"data": {
"videoroom": "event",
"room": 1234,
"unpublished": 2029048014
}
}
}
Request completed, freeing data
[Thread 0x7ffeccff9700 (LWP 41646) exited]
[3219522568] Got an RTCP packet (video stream)!
[2981913240] Got an RTCP packet (bundled stream)!
Parsing compound packet (total of 56 bytes)
#1 SR (200)
#2 SDES (202)
End of compound packet
[2981913240] Fixing SSRCs (local 24281518, peer 1149256186)
Parsing compound packet (total of 56 bytes)
#1 SR (200)
#2 SDES (202)
End of compound packet
We have a message to serve...
{
"janus": "event",
"session_id": 1959107420,
"sender": 2251675757,
"plugindata": {
"plugin": "janus.plugin.videoroom",
"data": {
"videoroom": "event",
"room": 1234,
"unpublished": 2029048014
}
}
}
Request completed, freeing data
[Thread 0x7ffed67fc700 (LWP 41650) exited]
We have a message to serve...
{
"janus": "detached",
"sender": 1752997381
}
Request completed, freeing data
We have a message to serve...
{
"janus": "event",
"session_id": 4089299180,
"sender": 4203788193,
"plugindata": {
"plugin": "janus.plugin.videoroom",
"data": {
"videoroom": "event",
"room": 1234,
"unpublished": 2029048014
}
}
}
Request completed, freeing data
[Thread 0x7ffed57fa700 (LWP 41647) exited]
[Thread 0x7ffec5ffb700 (LWP 41669) exited]
[4203788193] Got an RTCP packet (bundled stream)!
Parsing compound packet (total of 56 bytes)
#1 SR (200)
#2 SDES (202)
End of compound packet
[4203788193] Fixing SSRCs (local 735838693, peer 3299773666)
Parsing compound packet (total of 56 bytes)
#1 SR (200)
#2 SDES (202)
End of compound packet
[122596169] Got an RTCP packet (bundled stream)!
[3640328392] Got an RTCP packet (bundled stream)!
[3831945170] Got an RTCP packet (bundled stream)!
[1688588234] Got an RTCP packet (bundled stream)!
[665465760] Got an RTCP packet (bundled stream)!
[34145618] Got an RTCP packet (bundled stream)!
[177755783] Got an RTCP packet (bundled stream)!
[944984623] Got an RTCP packet (bundled stream)!
[997673665] Got an RTCP packet (bundled stream)!
[34145618] Got an RTCP packet (bundled stream)!
[3792420650] Got an RTCP packet (bundled stream)!
[2526300903] Got an RTCP packet (bundled stream)!
[4047003953] Got an RTCP packet (bundled stream)!
Looks like DTLS!
[1463360886] DTLS check pending: 0
Written 61 of those bytes on the read BIO...
[1463360886] DTLS check pending: 0
[1463360886] DTLS alert received on stream 1, closing...
[1463360886] Telling the plugin about it (JANUS VideoRoom plugin)
No WebRTC media anymore
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffb2ffd700 (LWP 40571)]
0x00007ffff5717814 in pthread_mutex_lock () from /usr/lib/libpthread.so.0
(gdb) bt
#0 0x00007ffff5717814 in pthread_mutex_lock () from /usr/lib/libpthread.so.0
#1 0x00007fffe83b8056 in janus_videoroom_hangup_media () from /usr/lib/janus/plugins/libjanus_videoroom.so
#2 0x000000000040fefd in janus_dtls_callback ()
#3 0x00007ffff6ff036f in dtls1_read_bytes () from /usr/lib/libssl.so.1.0.0
#4 0x00007ffff6fda76b in ssl3_read () from /usr/lib/libssl.so.1.0.0
#5 0x000000000040ee24 in janus_dtls_srtp_incoming_msg ()
#6 0x0000000000413687 in janus_ice_cb_nice_recv ()
#7 0x00007ffff7ba7953 in ?? () from /usr/lib/libnice.so.10
#8 0x00007ffff7bada1d in ?? () from /usr/lib/libnice.so.10
#9 0x00007ffff4e67091 in ?? () from /usr/lib/libgio-2.0.so.0
#10 0x00007ffff768bb8d in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#11 0x00007ffff768bf68 in ?? () from /usr/lib/libglib-2.0.so.0
#12 0x00007ffff768c292 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#13 0x000000000041402e in janus_ice_thread ()
#14 0x00007ffff76b2765 in ?? () from /usr/lib/libglib-2.0.so.0
#15 0x00007ffff5715314 in start_thread () from /usr/lib/libpthread.so.0
#16 0x00007ffff54533ed in clone () from /usr/lib/libc.so.6
(gdb) quit
If there's anything else I can get you that'd be of use, let me know! I'm generally running this on the weekends, but I am very much interested in getting janus as stable as possible, as the community for which I am running this is currently dependent upon the flash-based tinychat, which is rather flawed and limited.
I wonder if the mutex is getting destroyed before that last lock? I do not think there is a limit on how many threads can request a mutex. But if it gets destroyed and one tries to lock it, there may be a problem.
On Sat, Sep 27, 2014 at 10:01 PM, Lucas Henderson notifications@github.com wrote:
If there's anything else I can get you that'd be of use, let me know! I'm generally running this on the weekends, but I am very much interested in getting janus as stable as possible, as the community for which I am running this is currently dependent upon the flash-based tinychat, which is rather flawed and limited.
— Reply to this email directly or view it on GitHub https://github.com/meetecho/janus-gateway/issues/69#issuecomment-57073212 .
It's likely a race condition, where structure that contains the lock has been freed and then it's accessed again causing the segfault. I'll look into it when I get back to the lab.
Let me know if this last fix improves things or not.
Thanks! Local testing seems to indicate the bug has been fixed, as I have been unable to crash it so far with the multiple-tab-closing procedure mentioned above. I'll be able to test with more people than myself this coming weekend. I'll let you know how that goes.
Ok keep me posted!
I was able to some more testing last night. It would seem the mutex related error is gone, and thus crashing is much much less frequent, but at one point, when I lost wireless connection on my phone, which was part of the video chat, the server crashed then. I'll likely be able to get more logs this evening and tomorrow if it crashes again, but here's what I got last night.
No WebRTC media anymore
Handle detached (0), scheduling destruction
[Thread 0x7fff847c0700 (LWP 22551) exited]
[New Thread 0x7fff96fe5700 (LWP 22552)]
[Thread 0x7fff96fe5700 (LWP 22552) exited]
[New Thread 0x7fff847c0700 (LWP 22553)]
[New Thread 0x7fff96fe5700 (LWP 22554)]
[Thread 0x7fff96fe5700 (LWP 22554) exited]
[1627842849] WebRTC resources freed
[1627842849] Handle and related resources freed
[Thread 0x7fffb27fc700 (LWP 20884) exited]
[New Thread 0x7fff86fc5700 (LWP 22555)]
[2390854883] WebRTC resources freed
[2390854883] Handle and related resources freed
[Thread 0x7fff9d7f2700 (LWP 21486) exited]
[Thread 0x7fff86fc5700 (LWP 22555) exited]
[New Thread 0x7fff96fe5700 (LWP 22556)]
Detaching handle from JANUS VideoRoom plugin
No WebRTC media anymore
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff96fe5700 (LWP 22556)]
0x00007ffff76a9658 in g_slist_remove () from /usr/lib/libglib-2.0.so.0
(gdb) #0 0x00007ffff76a9658 in g_slist_remove () from /usr/lib/libglib-2.0.so.0
#1 0x00007fffea9db261 in janus_videoroom_hangup_media () from /usr/lib/janus/plugins/libjanus_videoroom.so
#2 0x00007fffea9d7ff4 in janus_videoroom_destroy_session () from /usr/lib/janus/plugins/libjanus_videoroom.so
#3 0x0000000000411f4e in janus_ice_handle_destroy ()
#4 0x000000000041bfd1 in janus_process_incoming_request ()
#5 0x000000000041a831 in janus_ws_handler ()
#6 0x00007ffff7430001 in ?? () from /usr/lib/libmicrohttpd.so.10
#7 0x00007ffff74314b8 in ?? () from /usr/lib/libmicrohttpd.so.10
#8 0x00007ffff7434f71 in ?? () from /usr/lib/libmicrohttpd.so.10
Please try it on the latest version as well, as I made some more fixes today.
Recompiled it last night. got a few more backtraces over the course of the night.
1.
[1033302858] Got an RTCP packet (bundled stream)!
[1033302858] Just got some NACKS we should handle...
[3321294332] Got an RTCP packet (bundled stream)!
[3321294332] Just got some NACKS we should handle...
[2350152689] Got an RTCP packet (bundled stream)!
[2350152689] Just got some NACKS we should handle...
[2064108332] Got an RTCP packet (bundled stream)!
[2064108332] Got an RTCP packet (bundled stream)!
[2821264097] Got an RTCP packet (bundled stream)!
[2821264097] Just got some NACKS we should handle...
[1366429993] Got an RTCP packet (bundled stream)!
[1366429993] Just got some NACKS we should handle...
Cleaning up session 3853130934...
Destroying session 3853130934
Detaching handle from JANUS VideoRoom plugin
Removing Video Room session...
No WebRTC media anymore
[4251757422] ICE thread ended!
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd29dc700 (LWP 8826)]
0x00007ffff76a9649 in g_slist_remove () from /usr/lib/libglib-2.0.so.0
#0 0x00007ffff76a9649 in g_slist_remove () from /usr/lib/libglib-2.0.so.0
#1 0x00007fffea9db261 in janus_videoroom_hangup_media () from /usr/lib/janus/plugins/libjanus_videoroom.so
#2 0x00007fffea9d7ff4 in janus_videoroom_destroy_session () from /usr/lib/janus/plugins/libjanus_videoroom.so
#3 0x0000000000411f4e in janus_ice_handle_destroy ()
#4 0x0000000000418bb0 in janus_session_destroy ()
#5 0x0000000000418397 in ?? ()
#6 0x00007ffff768c633 in ?? () from /usr/lib/libglib-2.0.so.0
#7 0x00007ffff768bb8d in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#8 0x00007ffff768bf68 in ?? () from /usr/lib/libglib-2.0.so.0
#9 0x00007ffff768c292 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#10 0x00000000004187dc in ?? ()
#11 0x00007ffff76b2765 in ?? () from /usr/lib/libglib-2.0.so.0
#12 0x00007ffff5715314 in start_thread () from /usr/lib/libpthread.so.0
#13 0x00007ffff54533ed in clone () from /usr/lib/libc.so.6
2.
v=0
o=- 3586178845524870773 2 IN IP4 127.0.0.1
s=-
t=0 0
m=video 1 RTP/SAVPF 100
c=IN IP4 1.1.1.1
a=recvonly
a=rtpmap:100 VP8/90000
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=rtcp-fb:100 goog-remb
{
"request": "start",
"room": 1234
}
Handling message: {
"request": "start",
"room": 1234
}
Creating plugin result...
Destroying plugin result...
Request completed, freeing data
[Thread 0x7fff37726700 (LWP 6325) exited]
[3137175534] Got an RTCP packet (video stream)!
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffea1d2700 (LWP 26529)]
0x00007fffea9de228 in ?? () from /usr/lib/janus/plugins/libjanus_videoroom.so
#0 0x00007fffea9de228 in ?? () from /usr/lib/janus/plugins/libjanus_videoroom.so
#1 0x00007ffff76b2765 in ?? () from /usr/lib/libglib-2.0.so.0
#2 0x00007ffff5715314 in start_thread () from /usr/lib/libpthread.so.0
#3 0x00007ffff54533ed in clone () from /usr/lib/libc.so.6
3.
[3759971358] Fixing SSRCs (local 1370979395, peer 3567495878)
Parsing compound packet (total of 56 bytes)
#1 SR (200)
#2 SDES (202)
End of compound packet
[920334819] Got an RTCP packet (bundled stream)!
[2864672436] Got an RTCP packet (bundled stream)!
[2253506568] Got an RTCP packet (bundled stream)!
[23688295] Got an RTCP packet (bundled stream)!
[23688295] Just got some NACKS we should handle...
[377909605] Got an RTCP packet (bundled stream)!
[3093137391] Got an RTCP packet (bundled stream)!
[2857569508] Got an RTCP packet (video stream)!
Checking 8 old sessions
Freeing listener
Freeing listener
Freeing listener
Freeing listener
Freeing listener
Freeing listener
Freeing listener
Looks like DTLS!
[392626332] DTLS check pending: 0
Written 61 of those bytes on the read BIO...
[392626332] DTLS check pending: 0
[392626332] DTLS alert received on stream 1, closing...
[392626332] Telling the plugin about it (JANUS VideoRoom plugin)
No WebRTC media anymore
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff88fc9700 (LWP 8436)]
0x00007ffff76a9649 in g_slist_remove () from /usr/lib/libglib-2.0.so.0
#0 0x00007ffff76a9649 in g_slist_remove () from /usr/lib/libglib-2.0.so.0
#1 0x00007fffea9db261 in janus_videoroom_hangup_media () from /usr/lib/janus/plugins/libjanus_videoroom.so
#2 0x000000000040ff7b in janus_dtls_callback ()
#3 0x00007ffff6ff036f in dtls1_read_bytes () from /usr/lib/libssl.so.1.0.0
#4 0x00007ffff6fda76b in ssl3_read () from /usr/lib/libssl.so.1.0.0
#5 0x000000000040ee9b in janus_dtls_srtp_incoming_msg ()
#6 0x0000000000413705 in janus_ice_cb_nice_recv ()
#7 0x00007ffff7ba7953 in ?? () from /usr/lib/libnice.so.10
#8 0x00007ffff7bada1d in ?? () from /usr/lib/libnice.so.10
#9 0x00007ffff4e67091 in ?? () from /usr/lib/libgio-2.0.so.0
#10 0x00007ffff768bb8d in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#11 0x00007ffff768bf68 in ?? () from /usr/lib/libglib-2.0.so.0
#12 0x00007ffff768c292 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#13 0x00000000004140ac in janus_ice_thread ()
#14 0x00007ffff76b2765 in ?? () from /usr/lib/libglib-2.0.so.0
#15 0x00007ffff5715314 in start_thread () from /usr/lib/libpthread.so.0
#16 0x00007ffff54533ed in clone () from /usr/lib/libc.so.6
4.
v=0
o=- 5473031050256225990 2 IN IP4 127.0.0.1
s=-
t=0 0
m=video 1 RTP/SAVPF 100
c=IN IP4 1.1.1.1
a=recvonly
a=rtpmap:100 VP8/90000
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=rtcp-fb:100 goog-remb
{
"request": "start",
"room": 1234
}
Handling message: {
"request": "start",
"room": 1234
}
Creating plugin result...
Destroying plugin result...
Request completed, freeing data
[Thread 0x7fff977e6700 (LWP 9105) exited]
[1599365436] Got an RTCP packet (bundled stream)!
[1599365436] Just got some NACKS we should handle...
[581262645] Adding remote candidate for component 1 to stream 1
[581262645] Adding host candidate... 192.168.1.2:60546
[581262645] Candidate added to the list! (2 elements for 1/1)
Request completed, freeing data
[Thread 0x7fff987e8700 (LWP 9073) exited]
[3692521758] Got an RTCP packet (bundled stream)!
[13446333] Got an RTCP packet (video stream)!
[2296625115] Got an RTCP packet (video stream)!
[2296625115] Just got some NACKS we should handle...
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe21d2700 (LWP 8556)]
0x00007fffe29de224 in ?? () from /usr/lib/janus/plugins/libjanus_videoroom.so
#0 0x00007fffe29de224 in ?? () from /usr/lib/janus/plugins/libjanus_videoroom.so
#1 0x00007ffff76b2765 in ?? () from /usr/lib/libglib-2.0.so.0
#2 0x00007ffff5715314 in start_thread () from /usr/lib/libpthread.so.0
#3 0x00007ffff54533ed in clone () from /usr/lib/libc.so.6
Unfortunately your traces tell nothing about the cause, did you compile Janus with debugging disabled? debugging version of the libraries would help too.
Huh weird. I compiled it with -g but apparently I got some sort of issue going on when running it with gdb.
Got object file from memory but can't read symbols: File truncated.
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
Ah! Had to recompile gdb, but now it appears I'm getting proper backtraces. Compiled 10-20 minutes ago.
[2700530320] Video has been negotiated
[2700530320] SCTP/DataChannels have NOT been negotiated
[2700530320] The browser supports BUNDLE
[2700530320] The browser supports rtcp-mux
[2700530320] The browser is doing Trickle ICE
[2700530320] Parsing video candidates (stream=1)...
[2700530320] ICE ufrag (local): x2TQqxlrKRP3iLvv
[2700530320] ICE pwd (local): TMdjZVMfTJq4E9AhBIRBG/jX
[2700530320] Fingerprint (local) : sha-256 27:A9:C9:2A:1C:2F:F4:CE:D1:35:CE:14:04:09:17:33:37:DD:D5:5F:0A:C8:7E:9F:2A:72:B5:4D:81:9A:E1:48
[2700530320] DTLS setup (local): active
[2700530320] -- bundle is supported by the browser, getting rid of one of the RTP/RTCP components, if any...
[2700530320] -- rtcp-mux is supported by the browser, getting rid of RTCP components, if any...
[2700530320] -- ICE Trickling is supported by the browser, waiting for remote candidates...
-------------------------------------------
>> Anonymized (527 --> 232 bytes)
-------------------------------------------
v=0
o=- 6993883968917617891 2 IN IP4 127.0.0.1
s=-
t=0 0
m=video 1 RTP/SAVPF 100
c=IN IP4 1.1.1.1
a=recvonly
a=rtpmap:100 VP8/90000
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=rtcp-fb:100 goog-remb
{
"request": "start",
"room": 1234
}
Handling message: {
"request": "start",
"room": 1234
}
Creating plugin result...
Destroying plugin result...
Request completed, freeing data
[Thread 0x7fff9afed700 (LWP 7094) exited]
[FIR] seqnr=1 (20 bytes)
Resuming publisher, sending FIR to 2346098073 (^B)
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffea1d1700 (LWP 5823)]
0x0000000000437298 in janus_flags_is_set (flags=0x21, flag=8) at utils.c:42
42 uint32_t bit = *flags & flag;
#0 0x0000000000437298 in janus_flags_is_set (flags=0x21, flag=8) at utils.c:42
#1 0x0000000000427159 in janus_relay_rtcp (plugin_session=0x7fffd8293fa0, video=1, buf=0x7fffea1d0b20 "\204", <incomplete sequence \316>, len=20) at janus.c:3225
#2 0x00007fffea9de008 in janus_videoroom_handler (data=0x0) at plugins/janus_videoroom.c:1864
#3 0x00007ffff76b2765 in ?? () from /usr/lib/libglib-2.0.so.0
#4 0x00007ffff5715314 in start_thread () from /usr/lib/libpthread.so.0
#5 0x00007ffff54533ed in clone () from /usr/lib/libc.so.6
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
More incoming as I get em.
It looks like an inconsistent janus_ice_handle
instance, probably accessed after it was freed. My guess is that the huge debugging + valgrind slow Janus to the point that the lazy free we do for the handle is not lazy enough.
If I got the log correctly, you also had a new user join when most left, right? As the crash is originated by a listener "start" request, where the requested publisher has already left. Unless this was caused by the too many participants from the same machine, which resulted in very delayed messaging. Please notice that for effective tests involving more participants all on the same machine, due to the long poll usage in the HTTP API you'll have to use different Chrome profiles, as a single Chrome profile can issue at max 6 connections towards the same server (and with 4-5 participants you'd have 4-5 busy on a long poll most of the time, with limited resources free for other requests). As an alternative you can try the WebSockets interface that shouldn't have this limitation.
I'll give the websockets interface a spin soon as I have a chance. My end goal later, when I have time, will be to use something like node to manage janus via rabbitmq. For now I'm just hoping to help get janus itself as stable as possible. I disabled all of janus's internal debugging, leaving only gdb, and was able to get the following two crashes. The first occurred with around 8 people, roughly at a time when chrome died on my end due to unrelated memory issues(though still not entirely sure if that cause the janus crash), the second with around 14 people.(well, around 10 people, but a few were streaming two cameras). Not sure exactly what occurred then. The current setup is just running a single room on an altered version of the demo, logging backtraces and restarting upon crashes. My goal is to equal and hopefully surpass services like tinychat, which has a cap of 12 people per room and lower quality video.
[New Thread 0x7fff0fecf700 (LWP 18565)]
[Thread 0x7fff0fecf700 (LWP 18565) exited]
[New Thread 0x7fff0eecd700 (LWP 18566)]
[Thread 0x7fff0eecd700 (LWP 18566) exited]
[New Thread 0x7fff0fecf700 (LWP 18567)]
[Thread 0x7fff0fecf700 (LWP 18567) exited]
[New Thread 0x7fff0eecd700 (LWP 18568)]
[Thread 0x7fff0eecd700 (LWP 18568) exited]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd69dc700 (LWP 14773)]
0x00007ffff76a9649 in g_slist_remove () from /usr/lib/libglib-2.0.so.0
#0 0x00007ffff76a9649 in g_slist_remove () from /usr/lib/libglib-2.0.so.0
#1 0x00007fffea9da2f8 in janus_videoroom_hangup_media (handle=0x7fffb048e9c0) at plugins/janus_videoroom.c:1201
#2 0x00007fffea9d7066 in janus_videoroom_destroy_session (handle=0x7fffb048e9c0, error=0x7fffd69dbcf4) at plugins/janus_videoroom.c:582
#3 0x0000000000411fde in janus_ice_handle_destroy (gateway_session=0x7fffb00016b0, handle_id=1906027245) at ice.c:323
#4 0x0000000000419484 in janus_session_destroy (session_id=499112132) at janus.c:337
#5 0x0000000000418c6b in janus_cleanup_session (user_data=0x7fffb00016b0) at janus.c:213
#6 0x00007ffff768c633 in ?? () from /usr/lib/libglib-2.0.so.0
#7 0x00007ffff768bb8d in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#8 0x00007ffff768bf68 in ?? () from /usr/lib/libglib-2.0.so.0
#9 0x00007ffff768c292 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#10 0x00000000004190b0 in janus_sessions_watchdog (user_data=0x6babd0) at janus.c:278
#11 0x00007ffff76b2765 in ?? () from /usr/lib/libglib-2.0.so.0
#12 0x00007ffff5715314 in start_thread () from /usr/lib/libpthread.so.0
#13 0x00007ffff54533ed in clone () from /usr/lib/libc.so.6
[New Thread 0x7ffe69702700 (LWP 14749)]
[Thread 0x7ffe98f61700 (LWP 14687) exited]
[New Thread 0x7ffe6b706700 (LWP 14750)]
[Thread 0x7ffe69702700 (LWP 14749) exited]
[New Thread 0x7ffe68f01700 (LWP 14751)]
(process:3513): GLib-ERROR **: Creating pipes for GWakeup: Too many open files
[New Thread 0x7ffe69702700 (LWP 14752)]
[Thread 0x7ffe85f3b700 (LWP 14688) exited]
Program received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 0x7fffea1d1700 (LWP 3524)]
0x00007ffff7692d00 in g_logv () from /usr/lib/libglib-2.0.so.0
#0 0x00007ffff7692d00 in g_logv () from /usr/lib/libglib-2.0.so.0
#1 0x00007ffff7692f1f in g_log () from /usr/lib/libglib-2.0.so.0
#2 0x00007ffff76cef32 in ?? () from /usr/lib/libglib-2.0.so.0
#3 0x00007ffff7689677 in g_main_context_new () from /usr/lib/libglib-2.0.so.0
#4 0x000000000041519b in janus_ice_setup_local (handle=0x7ffee8eb4420, offer=0, audio=0, video=1, data=0, bundle=0, rtcpmux=0, trickle=1) at ice.c:1025
#5 0x00000000004266e7 in janus_handle_sdp (handle=0x7fff95cbf3a0, plugin=0x7fffeabe4380 <janus_videoroom_plugin>, sdp_type=0x7fffea9e282b "offer", sdp=0x7fffe1e69bb0 "v=0\r\no=- 171705729430 171705729430 IN IP4 127.0.0.1\r\ns=Demo Room\r\nt=0 0\r\nm=video 1 RTP/SAVPF 100\r\nc=IN IP4 1.1.1.1\r\nb=AS:384\r\na=sendonly\r\na=rtpmap:100 VP8/90000\r\na=rtcp-fb:100 ccm fir\r\na=rtcp-fb:100 n"...) at janus.c:3087
#6 0x0000000000426008 in janus_push_event (handle=0x7fff95cbf3a0, plugin=0x7fffeabe4380 <janus_videoroom_plugin>, transaction=0x7fff95379730 "S5h6dpWqsJyX", message=0x7fffe1e697d0 "{\n \"videoroom\": \"attached\",\n \"room\": 1234,\n \"id\": 2458522519,\n \"display\": \"Mal4\"\n}", sdp_type=0x7fffea9e282b "offer", sdp=0x7fffe1e69bb0 "v=0\r\no=- 171705729430 171705729430 IN IP4 127.0.0.1\r\ns=Demo Room\r\nt=0 0\r\nm=video 1 RTP/SAVPF 100\r\nc=IN IP4 1.1.1.1\r\nb=AS:384\r\na=sendonly\r\na=rtpmap:100 VP8/90000\r\na=rtcp-fb:100 ccm fir\r\na=rtcp-fb:100 n"...) at janus.c:2993
#7 0x00007fffea9dc193 in janus_videoroom_handler (data=0x0) at plugins/janus_videoroom.c:1535
#8 0x00007ffff76b2765 in ?? () from /usr/lib/libglib-2.0.so.0
#9 0x00007ffff5715314 in start_thread () from /usr/lib/lib
For the "Too many open files" error, you have to increase the ulimit limits, as explained in the FAQ you can find in the documentation.
For what concerns WebSockets/RabbitMQ, they definitely are a more efficient transport, but I was not referring to those as a fix to the segfaults, but rather as a way to overcome the "maximum 6 clients per browser" limitation that affects Chrome. You can find more info in the comments on issue #10. The best option to test multiple clients is obviously using different machines that are not the same as the server's. Since with HTTP the top is 6 connections per browser, ideally you'd split 5 participants on one machine and 5 on another, or 5 on a Chrome profile and other 5 on a different profile, and so on. The memory in Chrome is probably exhausting since it has to encode and decode a LOT of streams, and there's just so much it can do. Look into the --use-fake-ui-for-media-stream
and --use-fake-device-for-media-stream
browser options to just simulate webcams, which will make the job for the browser easier.
Closing for lack of feedback, feel free to reopen if it's still an issue.
Ah, yeah, sorry, meant to comment again but forgot. With websockets, plus some of the other changes over the past few weeks, it's pretty much rock solid, at least when it comes to disconnects. Thanks for all your work on this!
I've been testing Janus, and have been able to somewhat reproducably crash it by opening up a few clients locally and closing them in rapid succession. In tests with larger groups of people(12ish broadcasters), I think I found that clients using firefox had a higher tendency to crash the server, though I was unable to get detailed logs at the time, though it may not be the same cause, as I was able to reproduce the rapid disconnect error with both firefox and chrome. I may get a chance to try again this weekend though.
I'm running arch linux.
In both instances, I connected 4-5 clients, and then closed the tabs rapidly. I've seen similar crashes occur under different circumstances, but this was the only one I was able to reproduce reliably. Both of these were done on the stock MCU test page.
I ran janus with debugging at 7, and valgrind, and captured the following 2 crashes.