Closed jing3018 closed 9 years ago
I remove this line didn't fix the bug
#0 0x000000000042de2d in janus_relay_rtp (plugin_session=0xa0, video=0, buf=0x7f29aafecb40 "\200o\f\034", len=81) at janus.c:3832
#1 0x00007f29d885b42c in janus_videoroom_relay_rtp_packet (data=<optimized out>, user_data=0x7f29aafec6f0) at plugins/janus_videoroom.c:3880
#2 0x00007f29e52eab28 in g_slist_foreach () from /lib64/libglib-2.0.so.0
#3 0x00007f29d885bf27 in janus_videoroom_incoming_rtp (handle=0x7f29ac002e60, video=0, buf=0x7f29aafecb40 "\200o\f\034", len=81) at plugins/janus_videoroom.c:1896
#4 0x000000000042b630 in janus_ice_cb_nice_recv (agent=<optimized out>, stream_id=<optimized out>, component_id=<optimized out>, len=<optimized out>,
buf=0x7f29aafecb40 "\200o\f\034", ice=0x7f29ac058dc0) at ice.c:1303
#5 0x00007f29e5d87ab5 in nice_agent_g_source_cb () from /lib64/libnice.so.10
#6 0x00007f29e587a383 in socket_source_dispatch () from /lib64/libgio-2.0.so.0
#7 0x00007f29e52cd99a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#8 0x00007f29e52cdce8 in g_main_context_iterate.isra.24 () from /lib64/libglib-2.0.so.0
#9 0x00007f29e52cdfba in g_main_loop_run () from /lib64/libglib-2.0.so.0
#10 0x00000000004248a0 in janus_ice_thread (data=0x7f29ac0031d0) at ice.c:1605
#11 0x00007f29e52f36b5 in g_thread_proxy () from /lib64/libglib-2.0.so.0
#12 0x00007f29e2f25df3 in start_thread () from /lib64/libpthread.so.0
#13 0x00007f29e2c531ad in clone () from /lib64/libc.so.6
the line number is incorrectly, I modified this file.
Possibly related to #298 ?
maybe, but i can't reproduce this segfault with the steps used in #298
janus_ice_free
is only called when doing a lazy janus_ice_handles_cleanup
, which is scheduled about 3 seconds after an ICE handle has been marked as to be destroyed. As such, the RTP stuff shouldn't be invoked at all at that time, definitely not by the core: plugins should also be aware of the handle being invalid after those three seconds.
That said, at line 1303 of ice.c (one of the lines mentioned in the backtrace) there's nothing but a comment. Please make sure you're using an up-to-date version of Janus. I won't look into older versions of the code.
I'm sure it's the up-to-date version of janus. I add one line in file ice.c , so line 1303 is 1302 in your code. the line number in janus_videoroom.c is also incorrectly, I modified this file.
btw, in janus_ice_free, why use both g_hash_table_insert(old_plugin_sessions, handle->app_handle, handle->app_handle);
and g_free(handle->app_handle);
if not remvoe line g_free(handle->app_handle);
, it will often use a freed janus_plugin_session pointer at line 3832 of janus.c if(!plugin_session || plugin_session->stopped || buf == NULL || len < 1)
.
I remove this line, now the janus also crash in every 1 or 2 hours. the address of pointer janus_videoroom_session::handle will become incorrectly, eg : 0xa0 in the backtrace 。
==24504== ERROR: AddressSanitizer: heap-use-after-free on address 0x6010002f76a0 at pc 0x7f487351dfdf bp 0x7f48576ebfe0 sp 0x7f48576ebfd0
READ of size 8 at 0x6010002f76a0 thread T327 (ice thread)
#0 0x7f487351dfde (/opt/janus/lib/janus/plugins/libjanus_videoroom.so.0.0.0+0x39fde)
#1 0x7f48819eeb27 (/usr/lib64/libglib-2.0.so.0.4000.0+0x66b27)
#2 0x7f48735007b4 (/opt/janus/lib/janus/plugins/libjanus_videoroom.so.0.0.0+0x1c7b4)
#3 0x42ff43 (/opt/janus/bin/janus+0x42ff43)
#4 0x7f488248bab4 (/usr/lib64/libnice.so.10.1.0+0xaab4)
#5 0x7f4881f7e382 (/usr/lib64/libgio-2.0.so.0.4000.0+0x73382)
#6 0x7f48819d1999 (/usr/lib64/libglib-2.0.so.0.4000.0+0x49999)
#7 0x7f48819d1ce7 (/usr/lib64/libglib-2.0.so.0.4000.0+0x49ce7)
#8 0x7f48819d1fb9 (/usr/lib64/libglib-2.0.so.0.4000.0+0x49fb9)
#9 0x43473c (/opt/janus/bin/janus+0x43473c)
#10 0x7f48819f76b4 (/usr/lib64/libglib-2.0.so.0.4000.0+0x6f6b4)
#11 0x7f48826c9ac7 (/usr/lib64/libasan.so.0.0.0+0x19ac7)
#12 0x7f487f629df2 (/usr/lib64/libpthread-2.17.so+0x7df2)
#13 0x7f487f3571ac (/usr/lib64/libc-2.17.so+0xf61ac)
0x6010002f76a0 is located 0 bytes inside of 96-byte region [0x6010002f76a0,0x6010002f7700)
freed by thread T8 (vroom watchdog) here:
#0 0x7f48826c6009 (/usr/lib64/libasan.so.0.0.0+0x16009)
#1 0x7f4873520613 (/opt/janus/lib/janus/plugins/libjanus_videoroom.so.0.0.0+0x3c613)
#2 0x7f48734ed115 (/opt/janus/lib/janus/plugins/libjanus_videoroom.so.0.0.0+0x9115)
#3 0x7f48734ed856 (/opt/janus/lib/janus/plugins/libjanus_videoroom.so.0.0.0+0x9856)
#4 0x7f48819f76b4 (/usr/lib64/libglib-2.0.so.0.4000.0+0x6f6b4)
previously allocated by thread T9 (janus videoroom) here:
#0 0x7f48826c6225 (/usr/lib64/libasan.so.0.0.0+0x16225)
#1 0x7f487350a1c3 (/opt/janus/lib/janus/plugins/libjanus_videoroom.so.0.0.0+0x261c3)
#2 0x7f48819f76b4 (/usr/lib64/libglib-2.0.so.0.4000.0+0x6f6b4)
Mh, no line numbers, can you trying do a new configure like this (notice the extra -g3
)
CFLAGS="-g3 -fsanitize=address -fno-omit-frame-pointer" LDFLAGS="-lasan" ./configure --prefix=/opt/janus
with a new make clean && make && sudo make install
, and try again? With no line numbers I have no way to figure out where in the code it went wrong.
I think I found this bug
code in function janus_videoroom_destroy_session
} else if(session->participant_type == janus_videoroom_p_type_subscriber) {
/* Detaching this listener from its publisher is already done by hangup_media */
janus_videoroom_listener *listener = (janus_videoroom_listener *)session->participant;
if(listener) {
listener->paused = TRUE;
janus_videoroom_participant *publisher = listener->feed;
if(publisher != NULL) {
janus_mutex_lock(&publisher->listeners_mutex);
publisher->listeners = g_slist_remove(publisher->listeners, listener);
janus_mutex_unlock(&publisher->listeners_mutex);
listener->feed = NULL;
JANUS_LOG(LOG_ERR, "destroy_session janus_videoroom_listener before hangup_media\n");
}
}
} else if(session->participant_type == janus_videoroom_p_type_subscriber_muxed) {
/* Detaching this listener from its publishers is already done by hangup_media */
}
and i can found in log file
[ERR] [plugins/janus_videoroom.c:janus_videoroom_destroy_session:875]destroy_session janus_videoroom_listener before hangup_media
this bug is the same as #298 , I reproduce this segfault with the steps used in #298 , I kill -9 the chrome process .
Not sure I understand what bug you have found. Hangup media always precedes the destruction of a session object: if it doesn't, we force it ourselves anyway:
https://github.com/meetecho/janus-gateway/blob/master/plugins/janus_videoroom.c#L783
exactly to make sure no media is relayed anymore before we get rid of the session.
I'm not understand too, why listener->feed != NULL
in janus_videoroom_destroy_session.
but, when i add this code , it works.
Can you please explain what you're adding/changing/removing and where, with references to the line numbers as specified on the current github revision? janus_videoroom_destroy_session
ignores subscribers, which are removed in janus_videoroom_hangup_media
instead.
I found this bug in janus_videoroom_destroy_session
if(!session->destroyed) {
session->destroyed = janus_get_monotonic_time();
/* Any related WebRTC PeerConnection is not available anymore either */
janus_videoroom_hangup_media(handle);
but in janus_videoroom_hangup_media
void janus_videoroom_hangup_media(janus_plugin_session *handle) {
JANUS_LOG(LOG_INFO, "No WebRTC media anymore\n");
if(g_atomic_int_get(&stopping) || !g_atomic_int_get(&initialized))
return;
janus_videoroom_session *session = (janus_videoroom_session *)handle->plugin_handle;
if(!session) {
JANUS_LOG(LOG_ERR, "No session associated with this handle...\n");
return;
}
if(session->destroyed)
return;
it return !!!!!
fix in janus_videoroom_destroy_session, siwtch the two line.
if(!session->destroyed) {
/* Any related WebRTC PeerConnection is not available anymore either */
janus_videoroom_hangup_media(handle);
session->destroyed = janus_get_monotonic_time();
Aah ok, now I get it: since we're setting destroyed before hangupmedia is called, hangupmedia immediately returns and doesn't do what it should. Thanks for spotting this! Especially as it may affect other plugins as well.
I'll have to think about the best way to fix this, as the destroyed was there exactly to avoid some methods (including hangupmedia) to be called twice in a row, with potentially dangerous results. I'll let you know when I come up with a possible fix.
btw, in janus_ice_free, why use both g_hash_table_insert(old_plugin_sessions, handle->app_handle, handle->app_handle);
and g_free(handle->app_handle);
. I think need remove line g_free(handle->app_handle);
That's a safeguard about trying to use pointers that we know have been removed, but other entities (e.g., plugins) may not.
This last commit should properly fence hangup_media so that it is not called twice: destroyed is also set after that, as you suggested. Let me know if that fixes your issue for good and I'll close it.
I'm not sure the fence session->hangingup
could prevent hangup_media called twice , it's not a atomic operation .
when not add fence , the janus will coredump because the hangup_media called twice and participant->arc
or participant->vrc
will freed twice . so i add a fence like
if(g_atomic_int_add(&session->hangup_media, 1)) {
return;
}
now, I'm testting
Let me know if the Glib atomic operations fix that and I'll change it in all plugins.
@lminiero: Any idea when these fixes are ported to the modular-transports branch?
Just did that. I thought I had already done so, but it was stuck on some conflicts that I solved now.
Great, will pull right now and starting testing.
this issue fixed.
if call g_free(handle->app_handle) in this function , janus will segfault in janus.c::janus_relay_rtp line 3832 , I test this use videoroom plugin.
plugin_session->stopped test will fail and
Segfault