wernerd / ZRTPCPP

C++ Implementation of ZRTP protocol - GNU ZRTP C++
Other
115 stars 50 forks source link

Deadlock when using ZRTP as transport in pjsip_2.5-trunk #31

Open efendizadeh opened 8 years ago

efendizadeh commented 8 years ago

First of all thank you for great work. We are using pjsip in connection with ZRTP4PJ+ZRTPCPP for delivering secure calls. In development we faced with deadlock problem. After some investigation we found following situation:

As you can see from attached logs (start from thread 20). Pjsip initiates sending of rtp and passes it to zrtp transport over transport_send_rtp function for further processing. Via zrtp4pj API zrtpcpp processEvent(event_t *ev) method (zrtp/ZrtpStateClass.cpp Line 81). In processEvent method synchEnter is called, which locks zrtp mutex (let’s call it mutex_1). After that we returning back to pjsip transport_send_rtp where we have internal pjsip mutex lock (let’s call it mutex_2). At the same in thread 15, Pjsip, before calling srtp_rtp_cb, locks mutex_2, then process goes to start zrtp transport and we come in processEvent method's synchEnter call, where mutex_1 is already locked. Sample diagram attached.

After some investigation we decided that, internal zrtp synchronisation is not needed any more, pjsip is responsible for packet processing synchronisation now. We removed synchEnter and synchLeave and have tested most cases. At this time flight is success :) And the question is that, is this approach is correct?

Thread 15 2 pthread_mutex_lock () 3 pj_mutex_lock at /src/pj/os_core_unix.c:1265 4 zrtp_synchEnter at /zsrtp/transport_zrtp.c:768 5 ZrtpCallbackWrapper::synchEnter() at /zsrtp/zrtp/zrtp/ZrtpCallbackWrapper.cpp:121 6 ZRtp::synchEnter() at /zsrtp/zrtp/zrtp/ZRtp.cpp:2568 7 ZrtpStateClass::processEvent(Event) at /zsrtp/zrtp/zrtp/ZrtpStateClass.cpp:86 8 ZRtp::startZrtpEngine() at /zsrtp/zrtp/zrtp/ZRtp.cpp:253 9 ::zrtp_startZrtpEngine(ZrtpContext ) at /zsrtp/zrtp/zrtp/ZrtpCWrapper.cpp:119 10 pjmedia_transport_zrtp_startZrtp at /zsrtp/transport_zrtp.c:859 11 transport_rtp_cb at /zsrtp/transport_zrtp.c:987 12 srtp_rtp_cb at /src/pjmedia/transport_srtp.c:971 13 ice_on_rx_data at /src/pjmedia/transport_ice.c:1779 14 ice_rx_data at /src/pjnath/ice_strans.c:1467 15 pj_ice_sess_on_rx_pkt at /src/pjnath/ice_session.c:3020 16 stun_on_rx_data at /src/pjnath/ice_strans.c:1508 17 on_data_recvfrom at /src/pjnath/stun_sock.c:957 18 ioqueue_on_read_complete at /src/pj/activesock.c:498 19 ioqueue_dispatch_read_event at /src/pj/ioqueue_common_abs.c:605 20 pj_ioqueue_poll at /src/pj/ioqueue_select.c:981 21 pjsip_endpt_handle_events2 at /src/pjsip/sip_endpoint.c:741 22 pjsua_handle_events at /src/pjsua-lib/pjsua_core.c:1858 23 worker_thread at /src/pjsua-lib/pjsua_core.c:696 24 thread_main at /src/pj/os_core_unix.c:541 25 _pthread_body () 26 _pthread_start () 27 thread_start ()

Thread 20 2 _pthread_mutex_lock_slow () 3 pj_mutex_lock at /src/pj/os_core_unix.c:1265 4 pj_lock_acquire at /src/pj/lock.c:180 5 grp_lock_acquire at /src/pj/lock.c:290 6 pj_grp_lock_acquire at /src/pj/lock.c:478 7 pj_ice_strans_sendto at /src/pjnath/ice_strans.c:1247 8 transport_send_rtp at /src/pjmedia/transport_ice.c:1655 9 pjmedia_transport_send_rtp at /include/pjmedia/transport.h:678 10 transport_send_rtp at /src/pjmedia/transport_srtp.c:853 11 pjmedia_transport_send_rtp at /include/pjmedia/transport.h:678 12 zrtp_sendDataZRTP at /zsrtp/transportzrtp.c:461 13 ZrtpCallbackWrapper::sendDataZRTP(unsigned char const, int) at /zsrtp/zrtp/zrtp/ZrtpCallbackWrapper.cpp:37 14 ZRtp::sendPacketZRTP(ZrtpPacketBase) at /zsrtp/zrtp/zrtp/ZRtp.cpp:2577 15 ZrtpStateClass::evInitial() at /zsrtp/zrtp/zrtp/ZrtpStateClass.cpp:166 16 ZrtpStates::processEvent(ZrtpStateClass&) at /zsrtp/zrtp/zrtp/libzrtpcpp/ZrtpStates.h:68 17 ZrtpStateClass::processEvent(Event) at /zsrtp/zrtp/zrtp/ZrtpStateClass.cpp:151 18 ZRtp::startZrtpEngine() at /zsrtp/zrtp/zrtp/ZRtp.cpp:253 19 ::zrtp_startZrtpEngine(ZrtpContext ) at /zsrtp/zrtp/zrtp/ZrtpCWrapper.cpp:119 20 pjmedia_transport_zrtp_startZrtp at /zsrtp/transport_zrtp.c:859 21 transport_send_rtp at /zsrtp/transport_zrtp.c:1154 22 pjmedia_transport_send_rtp at /include/pjmedia/transport.h:678 23 put_frame_imp at /src/pjmedia/stream.c:1382 24 put_frame at /src/pjmedia/stream.c:1511 25 pjmedia_port_put_frame at /src/pjmedia/port.c:115 26 write_port at /src/pjmedia/conference.c:1760 27 get_frame at /src/pjmedia/conference.c:1998 28 pjmedia_port_get_frame at /src/pjmedia/port.c:98 29 resample_get_frame at /src/pjmedia/resample_port.c:182 30 pjmedia_port_get_frame at /src/pjmedia/port.c:98 31 play_cb at /src/pjmedia/sound_port.c:88 32 output_renderer at /src/pjmedia-audiodev/coreaudiodev.m:1073 33 AUInputElement::PullInput(unsigned long&, AudioTimeStamp const&, unsigned long, unsigned long) () 34 AUInputFormatConverter2::InputProc(OpaqueAudioConverter, unsigned long, AudioBufferList, AudioStreamPacketDescription, void) () 35 AudioConverterChain::CallInputProc(unsigned long) () 36 AudioConverterChain::FillBufferFromInputProc(unsigned long, CABufferList) () 37 BufferedAudioConverter::GetInputBytes(unsigned long, unsigned long&, CABufferList const&) () 38 CBRConverter::RenderOutput(CABufferList, unsigned long, unsigned long&, AudioStreamPacketDescription) () 39 BufferedAudioConverter::FillBuffer(unsigned long&, AudioBufferList&, AudioStreamPacketDescription) () 40 AudioConverterChain::RenderOutput(CABufferList, unsigned long, unsigned long&, AudioStreamPacketDescription) () 41 BufferedAudioConverter::FillBuffer(unsigned long&, AudioBufferList&, AudioStreamPacketDescription) () 42 AudioConverterFillComplexBuffer () 43 AUInputFormatConverter2::PullAndConvertInput(AudioTimeStamp const&, unsigned long&, AudioBufferList&, AudioStreamPacketDescription, bool&) () 44 AUConverterBase::RenderBus(unsigned long&, AudioTimeStamp const&, unsigned long, unsigned long) () 45 AUBase::DoRenderBus(unsigned long&, AudioTimeStamp const&, unsigned long, AUOutputElement, unsigned long, AudioBufferList&) () 46 AUBase::DoRender(unsigned long&, AudioTimeStamp const&, unsigned long, unsigned long, AudioBufferList&) () 47 AURemoteIO::PerformIO(unsigned long, unsigned int, unsigned int, AudioTimeStamp const&, AudioTimeStamp const&, AudioBufferList const, AudioBufferList_, int&) () 48 AURIOCallbackReceiver_PerformIO () 49 XPerformIO () 50 mshMIGPerform () 51 MSHMIGDispatchMessage () 52 AURemoteIO::IOThread::Run() () 53 AURemoteIO::IOThread::Entry(void) () 54 CAPThread::Entry(CAPThread*) () 55 _pthread_body () 56 _pthread_start () 57 thread_start ()

thread diagram

wernerd commented 8 years ago

The ZRTP intercal sync makes sure that the state engine handle only one packet at a time for a stream. Stream means: one RTP in/out data flow, for example s end/receive audio.

Looking at the stack trace I see that bot threads perform a "zrtp_startZrtpEngine(..). Why this? Only one thread should start a ZRTP stack for one stream . If ZRTP should work for another stream, for example adding Video to the call, the you need to have a second ZRTP instance.

Werner

Am 31.03.2016 um 14:59 schrieb Jamil:

First of all thank you for great work. We are using pjsip in connection with ZRTP4PJ+ZRTPCPP for delivering secure calls. In development we faced with deadlock problem. After some investigation we found following situation:

As you can see from attached logs (start from thread 20). Pjsip initiates sending of rtp and passes it to zrtp transport over transport_send_rtp function for further processing. Via zrtp4pj API zrtpcpp processEvent(event_t *ev) method (zrtp/ZrtpStateClass.cpp Line 81). In processEvent method synchEnter is called, which locks zrtp mutex (let’s call it mutex_1). After that we returning back to pjsip transport_send_rtp where we have internal pjsip mutex lock (let’s call it mutex_2). At the same in thread 15, Pjsip, before calling srtp_rtp_cb, locks mutex_2, then process goes to start zrtp transport and we come in processEvent method's synchEnter call, where mutex_1 is already locked. Sample diagram attached.

After some investigation we decided that, internal zrtp synchronisation is not needed any more, pjsip is responsible for packet processing synchronisation now. We removed synchEnter and synchLeave and have tested most cases. At this time flight is success :) And the question is that, is this approach is correct?

Thread 15
2 pthread_mutex_lock ()
3 pj_mutex_lock at /src/pj/os_core_unix.c:1265
4 zrtp_synchEnter at /zsrtp/transport_zrtp.c:768
5 ZrtpCallbackWrapper::synchEnter() at /zsrtp/zrtp/zrtp/ZrtpCallbackWrapper.cpp:121
6 ZRtp::synchEnter() at /zsrtp/zrtp/zrtp/ZRtp.cpp:2568
7 ZrtpStateClass::processEvent(Event*) at /zsrtp/zrtp/zrtp/ZrtpStateClass.cpp:86
8 ZRtp::startZrtpEngine() at /zsrtp/zrtp/zrtp/ZRtp.cpp:253
9 ::zrtp_startZrtpEngine(ZrtpContext *) at /zsrtp/zrtp/zrtp/ZrtpCWrapper.cpp:119
10 pjmedia_transport_zrtp_startZrtp at /zsrtp/transport_zrtp.c:859
11 transport_rtp_cb at /zsrtp/transport_zrtp.c:987
12 srtp_rtp_cb at /src/pjmedia/transport_srtp.c:971
13 ice_on_rx_data at /src/pjmedia/transport_ice.c:1779
14 ice_rx_data at /src/pjnath/ice_strans.c:1467
15 pj_ice_sess_on_rx_pkt at /src/pjnath/ice_session.c:3020
16 stun_on_rx_data at /src/pjnath/ice_strans.c:1508
17 on_data_recvfrom at /src/pjnath/stun_sock.c:957
18 ioqueue_on_read_complete at /src/pj/activesock.c:498
19 ioqueue_dispatch_read_event at /src/pj/ioqueue_common_abs.c:605
20 pj_ioqueue_poll at /src/pj/ioqueue_select.c:981
21 pjsip_endpt_handle_events2 at /src/pjsip/sip_endpoint.c:741
22 pjsua_handle_events at /src/pjsua-lib/pjsua_core.c:1858
23 worker_thread at /src/pjsua-lib/pjsua_core.c:696
24 thread_main at /src/pj/os_core_unix.c:541
25 _pthread_body ()
26 _pthread_start ()
27 thread_start ()

Thread 20
2 _pthread_mutex_lock_slow ()
3 pj_mutex_lock at /src/pj/os_core_unix.c:1265
4 pj_lock_acquire at /src/pj/lock.c:180
5 grp_lock_acquire at /src/pj/lock.c:290
6 pj_grp_lock_acquire at /src/pj/lock.c:478
7 pj_ice_strans_sendto at /src/pjnath/ice_strans.c:1247
8 transport_send_rtp at /src/pjmedia/transport_ice.c:1655
9 pjmedia_transport_send_rtp at /include/pjmedia/transport.h:678
10 transport_send_rtp at /src/pjmedia/transport_srtp.c:853
11 pjmedia_transport_send_rtp at /include/pjmedia/transport.h:678
12 zrtp_sendDataZRTP at /zsrtp/transport_zrtp.c:461
13 ZrtpCallbackWrapper::sendDataZRTP(unsigned char const/, int) at /zsrtp/zrtp/zrtp/ZrtpCallbackWrapper.cpp:37
14 ZRtp::sendPacketZRTP(ZrtpPacketBase/) at /zsrtp/zrtp/zrtp/ZRtp.cpp:2577
15 ZrtpStateClass::evInitial() at /zsrtp/zrtp/zrtp/ZrtpStateClass.cpp:166
16 ZrtpStates::processEvent(ZrtpStateClass&) at /zsrtp/zrtp/zrtp/libzrtpcpp/ZrtpStates.h:68
17 ZrtpStateClass::processEvent(Event/) at /zsrtp/zrtp/zrtp/ZrtpStateClass.cpp:151
18 ZRtp::startZrtpEngine() at /zsrtp/zrtp/zrtp/ZRtp.cpp:253
19 ::zrtp_startZrtpEngine(ZrtpContext *) at /zsrtp/zrtp/zrtp/ZrtpCWrapper.cpp:119
20 pjmedia_transport_zrtp_startZrtp at /zsrtp/transport_zrtp.c:859
21 transport_send_rtp at /zsrtp/transport_zrtp.c:1154
22 pjmedia_transport_send_rtp at /include/pjmedia/transport.h:678
23 put_frame_imp at /src/pjmedia/stream.c:1382
24 put_frame at /src/pjmedia/stream.c:1511
25 pjmedia_port_put_frame at /src/pjmedia/port.c:115
26 write_port at /src/pjmedia/conference.c:1760
27 get_frame at /src/pjmedia/conference.c:1998
28 pjmedia_port_get_frame at /src/pjmedia/port.c:98
29 resample_get_frame at /src/pjmedia/resample_port.c:182
30 pjmedia_port_get_frame at /src/pjmedia/port.c:98
31 play_cb at /src/pjmedia/sound_port.c:88
32 output_renderer at /src/pjmedia-audiodev/coreaudio_dev.m:1073
33 AUInputElement::PullInput(unsigned long&, AudioTimeStamp const&, unsigned long, unsigned long) ()
34 AUInputFormatConverter2::InputProc(OpaqueAudioConverter/, unsigned long/, AudioBufferList/, AudioStreamPacketDescription*/, void/) ()
35 AudioConverterChain::CallInputProc(unsigned long) ()
36 AudioConverterChain::FillBufferFromInputProc(unsigned long/, CABufferList/) ()
37 BufferedAudioConverter::GetInputBytes(unsigned long, unsigned long&, CABufferList const/&) ()
38 CBRConverter::RenderOutput(CABufferList/, unsigned long, unsigned long&, AudioStreamPacketDescription/) ()
39 BufferedAudioConverter::FillBuffer(unsigned long&, AudioBufferList&, AudioStreamPacketDescription/) ()
40 AudioConverterChain::RenderOutput(CABufferList/, unsigned long, unsigned long&, AudioStreamPacketDescription/) ()
41 BufferedAudioConverter::FillBuffer(unsigned long&, AudioBufferList&, AudioStreamPacketDescription/) ()
42 AudioConverterFillComplexBuffer ()
43 AUInputFormatConverter2::PullAndConvertInput(AudioTimeStamp const&, unsigned long&, AudioBufferList&, AudioStreamPacketDescription/, bool&) ()
44 AUConverterBase::RenderBus(unsigned long&, AudioTimeStamp const&, unsigned long, unsigned long) ()
45 AUBase::DoRenderBus(unsigned long&, AudioTimeStamp const&, unsigned long, AUOutputElement/, unsigned long, AudioBufferList&) ()
46 AUBase::DoRender(unsigned long&, AudioTimeStamp const&, unsigned long, unsigned long, AudioBufferList&) ()
47 AURemoteIO::PerformIO(unsigned long, unsigned int, unsigned int, AudioTimeStamp const&, AudioTimeStamp const&, AudioBufferList const/, AudioBufferList/, int&) ()
48 AURIOCallbackReceiver_PerformIO ()
49 _XPerformIO ()
50 mshMIGPerform ()
51 MSHMIGDispatchMessage ()
52 AURemoteIO::IOThread::Run() ()
53 AURemoteIO::IOThread::Entry(void/) ()
54 CAPThread::Entry(CAPThread*) ()
55 _pthread_body ()
56 _pthread_start ()
57 thread_start ()

thread diagram https://cloud.githubusercontent.com/assets/7306284/14176319/0bf35ffa-f759-11e5-9bf6-98ba92ac93ae.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/wernerd/ZRTPCPP/issues/31

Werner Dittmann email: Werner.Dittmann@t-online.de cell: +49 173 44 37 659 PGP key: 82EF5E8B

efendizadeh commented 8 years ago

Situation seems like following: We are not manipulating directly pjsip threads. In this situation on outgoing call transport_send_rtp and transport_rtp_cb functions calls by pjsip in most case on the same time by different threads. In connection of this pjmedia_transport_zrtp_startZrtp in ZRTP4PJ (transport_zrtp.c) calls more than one time at the same time. It happens regarding to zrtp->started not covered by sync primitives (mutex).

For solving our problem we are thinking about possible solution to add synchronisation to zrtp->started checking and setting calls. As an architect and developer of this great software we need your advise.

Thanks in advance.

aylesm commented 7 years ago

@efendizadeh did you get a solution for this? I'm seeing the same deadlock.

efendizadeh commented 7 years ago

@aylesm see attached patch deadlock_patch.txt

aylesm commented 7 years ago

@efendizadeh thanks. Looks like synchEnter/Leave are only called in that function. So commenting out the callback impl has the same effect as your patch.