meetecho / janus-gateway

Janus WebRTC Server
https://janus.conf.meetecho.com
GNU General Public License v3.0
8.23k stars 2.48k forks source link

Crash in sip plugin #1680

Closed danpascu closed 5 years ago

danpascu commented 5 years ago

Making a video call using the sip plugin (janus_sip.so) from a Chrome browser crashes janus as soon as the call starts with:

(gdb) bt
#0  0x00007f0f1819f0c0 in srtp_get_stream (srtp=srtp@entry=0x0, ssrc=1955739973) at srtp/srtp.c:2785
#1  0x00007f0f1819fbf6 in srtp_unprotect_mki (ctx=0x0, srtp_hdr=srtp_hdr@entry=0x7f0e9cff8540, pkt_octet_len=pkt_octet_len@entry=0x7f0e9cff83e4, use_mki=use_mki@entry=0) at srtp/srtp.c:2413
#2  0x00007f0f181a06b7 in srtp_unprotect (ctx=<optimized out>, srtp_hdr=srtp_hdr@entry=0x7f0e9cff8540, pkt_octet_len=pkt_octet_len@entry=0x7f0e9cff83e4) at srtp/srtp.c:2366
#3  0x00007f0f10468e71 in janus_sip_relay_thread (data=0x7f0ef00082e0) at plugins/janus_sip.c:4723
#4  0x00007f0f18956425 in  () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#5  0x00007f0f1800ffa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#6  0x00007f0f17f3980f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

If the call is audio only it doesn't crash.

Making the same calls with Firefox works without problems for both video and audio calls.

Chrome 73 &75 have the problem. Firefox 67 works.

1410

lminiero commented 5 years ago

Since it seems to crash deep within libsrtp, please provide a libasan dump, as otherwise we won't be able to figure out why it crashed (e.g., accessing something freed somewhere else).

danpascu commented 5 years ago

We went through this before already. I provided an ASAN dump the last time and it didn't help. Nothing changed, the problem is exactly the same, the backtrace is exactly the same , the ASAN dump will be exactly the same. See the old issue for reference.

Also the problem is not deep in srtp. Janus calls srtp_unprotect with a NULL ctx hence the crash.

lminiero commented 5 years ago

The libasan dump DID help, @atoppi fixed an issue thanks to that (processing of crypto attribute in SDP that caused a crash). I don't see other libasan dumps after that fix, so if you want our help please do a new one. If you think it's as simple as adding a NULL check before srtp_unprotect, feel free to submit a pull request.

atoppi commented 5 years ago

@danpascu might not be related, but I just noticed that we never check for janus_sip_srtp_set_remote return value. When using SRTP (like in your case), a failure in the SRTP context setup will not be detected, leading the relayer thread to crash because the context structs were not initialized. Can you spot any error in the Janus logs before the crash?

danpascu commented 5 years ago

Here you go. As expected, it gives the exact same information as the gdb backtrace:

Jul  2 13:41:16 node15 janus[17426]: AddressSanitizer:DEADLYSIGNAL
Jul  2 13:41:16 node15 janus[17426]: =================================================================
Jul  2 13:41:16 node15 janus[17426]: ==17426==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f64068170c0 bp 0x000000000000 sp 0x7f63e73702a8 T49)
Jul  2 13:41:16 node15 janus[17426]: ==17426==The signal is caused by a READ memory access.
Jul  2 13:41:16 node15 janus[17426]: ==17426==Hint: address points to the zero page.
Jul  2 13:41:16 node15 janus[17426]:     #0 0x7f64068170bf  (/usr/lib/x86_64-linux-gnu/libsrtp2.so.1+0x70bf)
Jul  2 13:41:16 node15 janus[17426]:     #1 0x7f6406817bf5 in srtp_unprotect_mki srtp/srtp.c:2413
Jul  2 13:41:16 node15 janus[17426]:     #2 0x7f63f6e4be5f in janus_sip_relay_thread plugins/janus_sip.c:4723
Jul  2 13:41:16 node15 janus[17426]:     #3 0x7f6406fce424  (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x76424)
Jul  2 13:41:16 node15 janus[17426]:     #4 0x7f6406687fa2 in start_thread /build/glibc-YJREGK/glibc-2.28/nptl/pthread_create.c:486
Jul  2 13:41:16 node15 janus[17426]:     #5 0x7f64065b180e in clone (/lib/x86_64-linux-gnu/libc.so.6+0xf980e)
Jul  2 13:41:16 node15 janus[17426]: AddressSanitizer can not provide additional info.
Jul  2 13:41:16 node15 janus[17426]: SUMMARY: AddressSanitizer: SEGV (/usr/lib/x86_64-linux-gnu/libsrtp2.so.1+0x70bf)
Jul  2 13:41:16 node15 janus[17426]: Thread T49 (siprtp tijmen) created by T47 (sip tijmen) here:
Jul  2 13:41:16 node15 janus[17426]:     #0 0x7f6407738db0 in __interceptor_pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x50db0)
Jul  2 13:41:16 node15 janus[17426]:     #1 0x7f6406fefc3f  (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x97c3f)
Jul  2 13:41:16 node15 janus[17426]: Thread T47 (sip tijmen) created by T19 (sip handler) here:
Jul  2 13:41:16 node15 janus[17426]:     #0 0x7f6407738db0 in __interceptor_pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x50db0)
Jul  2 13:41:16 node15 janus[17426]:     #1 0x7f6406fefc3f  (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x97c3f)
Jul  2 13:41:16 node15 janus[17426]: Thread T19 (sip handler) created by T0 here:
Jul  2 13:41:16 node15 janus[17426]:     #0 0x7f6407738db0 in __interceptor_pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x50db0)
Jul  2 13:41:16 node15 janus[17426]:     #1 0x7f6406fefc3f  (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x97c3f)
Jul  2 13:41:16 node15 janus[17426]: ==17426==ABORTING
Jul  2 13:41:16 node15 janus[17426]: AddressSanitizer:DEADLYSIGNAL
danpascu commented 5 years ago

@atoppi I have this in the logs right before the ASAN dump:

Jul  2 13:41:11 node15 janus[17426]: Creating new session: 1816669685386359; 0x607000014760
Jul  2 13:41:12 node15 janus[17426]: Creating new handle in session 1816669685386359: 1160697846196448; 0x607000014760 0x6120000109c0
Jul  2 13:41:12 node15 janus[17426]: [1160697846196448] Creating ICE agent (ICE Lite mode, controlled)
Jul  2 13:41:12 node15 janus[17426]: [1160697846196448] The DTLS handshake has been completed
Jul  2 13:41:12 node15 janus[17426]: [janus.plugin.videoroom-0x604000010f90] WebRTC media is now available
Jul  2 13:41:14 node15 janus[17426]: Creating new handle in session 7537290998373119: 7547396536001902; 0x607000014530 0x612000010b40
Jul  2 13:41:14 node15 janus[17426]: [7547396536001902] Creating ICE agent (ICE Lite mode, controlled)
Jul  2 13:41:14 node15 janus[17426]: [WARN] [SIP-tijmen] AES_CM_128_HMAC_SHA1_80
Jul  2 13:41:14 node15 janus[17426]: [WARN] [SIP-tijmen] Key/Salt/Master: 30/16/14
Jul  2 13:41:14 node15 janus[17426]: [WARN] [SIP-tijmen] AES_CM_128_HMAC_SHA1_80
Jul  2 13:41:14 node15 janus[17426]: [WARN] [SIP-tijmen] Key/Salt/Master: 30/16/14
Jul  2 13:41:15 node15 janus[17426]: [WARN] [SIP-tijmen] AES_CM_128_HMAC_SHA1_80
Jul  2 13:41:15 node15 janus[17426]: [WARN] [SIP-tijmen] Key/Salt/Master: 30/16/14
Jul  2 13:41:16 node15 janus[17426]: [7547396536001902] The DTLS handshake has been completed
Jul  2 13:41:16 node15 janus[17426]: [janus.plugin.sip-0x60400002e410] WebRTC media is now available
Jul  2 13:41:16 node15 janus[17426]: AddressSanitizer:DEADLYSIGNAL
danpascu commented 5 years ago

@atoppi That sounds suspiciously like the problem here. If the return is not checked but you set the has_remote_srtp = TRUE flag regardless, it might lead exactly to this when trying to access it later with ctx being uninitialized (i.e. NULL)

danpascu commented 5 years ago

@atoppi It seems this is not the cause. I modified the code to only set the flag if the function was successful, but it still crashes. That function was successful in this case (I printed the result code).

atoppi commented 5 years ago

@danpascu could you share a more verbose Janus log? Level 5 should be enough. Also try to minimize the traffic on the server in order to reduce the noise on the logs coming from other connections.

danpascu commented 5 years ago

@atoppi I found the cause of the problem. In the SDP offer there is an audio stream and a video stream and both have a a=crypto line. In the SDP answer there is also an audio and a video stream, but only the audio stream has the a=crypto line, the video stream includes some zrtp-hash instead and no crypto line. As a result, janus_sip_srtp_set_remote is only called for the audio stream. At the end of this process we have

session->media.audio_srtp_in = valid_context session->media.audio_srtp_out = valid_context session->media.video_srtp_in = NULL session->media.video_srtp_out = valid_context

Later when it crashes, it sees that session->media.video_rtp_fd != -1 and it calls strp_unprotect with the NULL context because session->media.has_srtp_remote is TRUE due to the audio stream having a a=crypto line.

atoppi commented 5 years ago

Good catch @danpascu ! Still I can't understand why on Firefox should work.

danpascu commented 5 years ago

The client that answers the call seems to not have a problem with the SDP from firefox. In that case it includes the crypto line for video as it should, but for chrome there is something that it doesn't like/understand and it switches from srtp to zrtp for some reason.

danpascu commented 5 years ago

Here is the exchange in case it helps shed some light:

working set:

SDP offer:

v=0
o=mozilla...THIS_IS_SDPARTA-67.0.2 8672853781216141646 2092608625722909081 IN IP4 xx.xx.xx.xx
s=-
t=0 0
a=sendrecv
m=audio 31088 RTP/AVP 109 9 0 8 101
c=IN IP4 xx.xx.xx.xx
a=rtpmap:109 opus/48000/2
a=fmtp:109 maxplaybackrate=48000;stereo=1;useinbandfec=1
a=rtpmap:9 G722/8000/1
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-15
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=extmap:2/recvonly urn:ietf:params:rtp-hdrext:csrc-audio-level
a=extmap:3 urn:ietf:params:rtp-hdrext:sdes:mid
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:5Mi6TGvEETjOCk2MwOrAxttrP5CHP3KM58/k77FC
m=video 34020 RTP/AVP 120 121 126 97
c=IN IP4 xx.xx.xx.xx
a=rtpmap:120 VP8/90000
a=fmtp:120 max-fs=12288;max-fr=60
a=rtpmap:121 VP9/90000
a=fmtp:121 max-fs=12288;max-fr=60
a=rtpmap:126 H264/90000
a=fmtp:126 profile-level-id=42e01f;level-asymmetry-allowed=1;packetization-mode=1
a=rtpmap:97 H264/90000
a=fmtp:97 profile-level-id=42e01f;level-asymmetry-allowed=1
a=rtcp-fb:120 nack
a=rtcp-fb:120 nack pli
a=rtcp-fb:120 ccm fir
a=rtcp-fb:120 goog-remb
a=rtcp-fb:121 nack
a=rtcp-fb:121 nack pli
a=rtcp-fb:121 ccm fir
a=rtcp-fb:121 goog-remb
a=rtcp-fb:126 nack
a=rtcp-fb:126 nack pli
a=rtcp-fb:126 ccm fir
a=rtcp-fb:126 goog-remb
a=rtcp-fb:97 nack
a=rtcp-fb:97 nack pli
a=rtcp-fb:97 ccm fir
a=rtcp-fb:97 goog-remb
a=extmap:3 urn:ietf:params:rtp-hdrext:sdes:mid
a=extmap:4 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:5 urn:ietf:params:rtp-hdrext:toffset
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:pWzDZ1/4m7U9WiPFNT+O24o4+qdJg69Aw7FHeKo3

SDP answer:

v=0
o=- 3771067560 3771067561 IN IP4 yy.yy.yy.yy
s=-
t=0 0
m=audio 52794 RTP/AVP 109 101
c=IN IP4 yy.yy.yy.yy
a=rtcp:52795
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:OlfVIRP5HSffLvylsmEgSAgcvxklJmlq9+nUxq7r
a=sendrecv
a=rtpmap:109 opus/48000/2
a=fmtp:109 useinbandfec=1
m=video 52706 RTP/AVP 120
c=IN IP4 yy.yy.yy.yy
b=TIAS:256000
a=rtcp:52707
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:0DVkYSM1kVVIPcv61fh+niXvSE8KSyX5mB6+NXbF
a=sendrecv
a=rtpmap:120 VP8/90000

and failing set:

SDP offer:

v=0
o=- 462650544911980473 3606948793761387824 IN IP4 xx.xx.xx.xx
s=-
t=0 0
m=audio 15764 RTP/AVP 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4 xx.xx.xx.xx
a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000
a=rtcp-fb:111 transport-cc
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=extmap:2 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:3 urn:ietf:params:rtp-hdrext:sdes:mid
a=extmap:4 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
a=extmap:5 urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:U8h0MFA+Ea3PStDhIDQvS9x7mtGKCV9beybHsoB3
m=video 34018 RTP/AVP 96 98 100 102 127 125 108 124 123
c=IN IP4 xx.xx.xx.xx
a=rtpmap:96 VP8/90000
a=rtpmap:98 VP9/90000
a=fmtp:98 profile-id=0
a=rtpmap:100 VP9/90000
a=fmtp:100 profile-id=2
a=rtpmap:102 H264/90000
a=fmtp:102 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42001f
a=rtpmap:127 H264/90000
a=fmtp:127 level-asymmetry-allowed=1;packetization-mode=0;profile-level-id=42001f
a=rtpmap:125 H264/90000
a=fmtp:125 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f
a=rtpmap:108 H264/90000
a=fmtp:108 level-asymmetry-allowed=1;packetization-mode=0;profile-level-id=42e01f
a=rtpmap:124 H264/90000
a=fmtp:124 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=4d0032
a=rtpmap:123 H264/90000
a=fmtp:123 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=640032
a=rtcp-fb:96 goog-remb
a=rtcp-fb:96 transport-cc
a=rtcp-fb:96 ccm fir
a=rtcp-fb:96 nack
a=rtcp-fb:96 nack pli
a=rtcp-fb:98 goog-remb
a=rtcp-fb:98 transport-cc
a=rtcp-fb:98 ccm fir
a=rtcp-fb:98 nack
a=rtcp-fb:98 nack pli
a=rtcp-fb:100 goog-remb
a=rtcp-fb:100 transport-cc
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=rtcp-fb:102 goog-remb
a=rtcp-fb:102 transport-cc
a=rtcp-fb:102 ccm fir
a=rtcp-fb:102 nack
a=rtcp-fb:102 nack pli
a=rtcp-fb:127 goog-remb
a=rtcp-fb:127 transport-cc
a=rtcp-fb:127 ccm fir
a=rtcp-fb:127 nack
a=rtcp-fb:127 nack pli
a=rtcp-fb:125 goog-remb
a=rtcp-fb:125 transport-cc
a=rtcp-fb:125 ccm fir
a=rtcp-fb:125 nack
a=rtcp-fb:125 nack pli
a=rtcp-fb:108 goog-remb
a=rtcp-fb:108 transport-cc
a=rtcp-fb:108 ccm fir
a=rtcp-fb:108 nack
a=rtcp-fb:108 nack pli
a=rtcp-fb:124 goog-remb
a=rtcp-fb:124 transport-cc
a=rtcp-fb:124 ccm fir
a=rtcp-fb:124 nack
a=rtcp-fb:124 nack pli
a=rtcp-fb:123 goog-remb
a=rtcp-fb:123 transport-cc
a=rtcp-fb:123 ccm fir
a=rtcp-fb:123 nack
a=rtcp-fb:123 nack pli
a=extmap:14 urn:ietf:params:rtp-hdrext:toffset
a=extmap:13 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:12 urn:3gpp:video-orientation
a=extmap:2 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:11 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay
a=extmap:6 http://www.webrtc.org/experiments/rtp-hdrext/video-content-type
a=extmap:7 http://www.webrtc.org/experiments/rtp-hdrext/video-timing
a=extmap:8 http://tools.ietf.org/html/draft-ietf-avtext-framemarking-07
a=extmap:9 http://www.webrtc.org/experiments/rtp-hdrext/color-space
a=extmap:3 urn:ietf:params:rtp-hdrext:sdes:mid
a=extmap:4 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id
a=extmap:5 urn:ietf:params:rtp-hdrext:sdes:repaired-rtp-stream-id
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:eVUaQPUZM9hK30mRl6LdQ+xQZT1eSgyKUkm+CJXx

SDP answer:

v=0
o=- 3771066931 3771066932 IN IP4 yy.yy.yy.yy
s=-
t=0 0
m=audio 52282 RTP/AVP 111 126
c=IN IP4 yy.yy.yy.yy
a=rtcp:52283
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:lkxxcOh12Q6UiyKaYcNtk1tm8rwcYSVjmClxJhl7
a=sendrecv
a=rtpmap:111 opus/48000/2
a=fmtp:111 useinbandfec=1
a=rtpmap:126 telephone-event/8000
a=fmtp:126 0-16
m=video 51318 RTP/AVP 96
c=IN IP4 yy.yy.yy.yy
b=TIAS:256000
a=rtcp:51319
a=zrtp-hash:1.10 b217f12023eccde4fb83b6a847223845291c775c59443b0a8d9a73f435eeaf48
a=sendrecv
a=rtpmap:96 VP8/90000
atoppi commented 5 years ago

@danpascu can you test this patch ? Just issue git apply diff.file and rebuild Janus.

This should avoid the crash but I'm not completely sure on how the negotiation would end because I can not reproduce the issue.

danpascu commented 5 years ago

Patch is corrupt (I used the download button from the pastebin interface to get it):

$ git apply janus.patch
error: corrupt patch at line 132
$ patch -p1 -s < janus.patch
patch unexpectedly ends in middle of line
$ 
atoppi commented 5 years ago

Downloading the file maybe adds some ending character that messes up the diff file. The patch is not corrupt, try copy-pasting the RAW content in a diff file.

danpascu commented 5 years ago

The patch fixes the crash. The call works now, the leg that doesn't provide the a=crypto line simply sends/receives data unencrypted.

atoppi commented 5 years ago

Glad to know, now let's wait @lminiero for a patch review.

lminiero commented 5 years ago

Traveling today, so that will only happen tomorrow at the soonest, sorry. Good to know it's fixed though!

lminiero commented 5 years ago

@danpascu one thing that puzzles me in that patch, is that it only changes the properties for remote SRTP audio and video, but not for local. This means that in your case, if audio is encrypted but video isn't, has_srtp_local is still TRUE. Since has_srtp_local is what is used to decide whether to call srtp_protect or not, it seems weird to me that it now works for you: in fact, it means we're trying to encrypt the video we send to the peer, so your endpoint shouldn't be able to process video at all. Unless maybe the srtp_protect does nothing because the SRTP context for video is empty?

Anyway, we should probably update that patch to ensure we treat local SRTP flags the same way as we do for remote ones.

lminiero commented 5 years ago

@danpascu please test the pull request above: it is @atoppi's patch with some additional properties on local SRTP flags as well. Don't have time to test it myself now, apologies.

lminiero commented 5 years ago

@danpascu any feedback on the above? Thx!

danpascu commented 5 years ago

I am extremely busy at the moment. I'll be able to get back to this in a few days.

danpascu commented 5 years ago

The new patch seems to work fine as well.

lminiero commented 5 years ago

Thanks for the feedback! I'll merge then :+1: