sipwise / rtpengine

The Sipwise media proxy for Kamailio
GNU General Public License v3.0
784 stars 368 forks source link

SRTP audio dropped by client when seqno wrapped and ssrc changed #209

Open mtryfoss opened 8 years ago

mtryfoss commented 8 years ago

Hi,

Consider this call scenario: Asterisk -> (rtp) Kamailio / rtpengine -> (srtp) Cisco SPA504 (firmware 7.6.0)

When the call has been running so long that the sequence number from Asterisk wraps to 0 and a little bit later ssrc changes (attended call transfer on Asterisk), the audio from Asterisk to Cisco phone is dropped by the phone (packet error increments). Audio from Cisco to Asterisk is working fine.

SSRC change before seqno has wrapped is not interrupting audio.

I'm not sure if this is a bug in rtpengine or in the Cisco phone.

Rtpengine version mr4.2.1.1.

Debug log: Jan 19 09:19:54 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] Received command 'offer' from 158.58.152.19:50009 Jan 19 09:19:54 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] Dump for 'offer' from 158.58.152.19:50009: { "sdp": "v=0#015#012o=root 1124425849 1124425849 IN IP4 158.58.154.47#015#012s=Nordicom#015#012c=IN IP4 158.58.154.47#015#012t=0 0#015#012m=audio 19980 RTP/AVP 8 107 3 9 101#015#012a=rtpmap:8 PCMA/8000#015#012a=rtpmap:107 opus/48000/2#015#012a=rtpmap:3 GSM/8000#015#012a=rtpmap:9 G722/8000#015#012a=rtpmap:101 telephone-event/8000#015#012a=fmtp:101 0-16#015#012a=ptime:20#015#012a=maxptime:60#015#012a=sendrecv#015#012", "ICE": "remove", "transport-protocol": "RTP/SAVP", "call-id": "4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060", "receive ... Jan 19 09:19:54 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] ... d-from": [ "IP4", "158.58.154.47" ], "from-tag": "as1575b04b", "command": "offer" } Jan 19 09:19:54 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] Creating new call Jan 19 09:19:54 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] set FILLED flag for stream 158.58.154.47:19980 Jan 19 09:19:54 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] set FILLED flag for stream 158.58.154.47:19981 Jan 19 09:19:54 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] Replying to 'offer' from 158.58.152.19:50009 Jan 19 09:19:54 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] Response dump for 'offer' to 158.58.152.19:50009: { "sdp": "v=0#015#012o=root 1124425849 1124425849 IN IP4 158.58.154.47#015#012s=Nordicom#015#012c=IN IP4 158.58.154.97#015#012t=0 0#015#012m=audio 35034 RTP/SAVP 8 107 3 9 101#015#012a=rtpmap:8 PCMA/8000#015#012a=rtpmap:107 opus/48000/2#015#012a=rtpmap:3 GSM/8000#015#012a=rtpmap:9 G722/8000#015#012a=rtpmap:101 telephone-event/8000#015#012a=fmtp:101 0-16#015#012a=ptime:20#015#012a=maxptime:60#015#012a=sendrecv#015#012a=rtcp:35035#015#012a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:Rk4bly3SEZEWSO6y6lmrq8QybA/2V8WwZRmERm9V#015#012a=setup:actpass#015#012a=fin ... Jan 19 09:19:54 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] ... gerprint:sha-1 6F:48:DF:43:C0:04:E2:48:BD:40:2F:9E:28:DC:5C:67:16:8E:FB:05#015#012", "result": "ok" } Jan 19 09:19:57 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] Received command 'answer' from 158.58.152.19:41208 Jan 19 09:19:57 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] Dump for 'answer' from 158.58.152.19:41208: { "sdp": "v=0#015#012o=- 50135395 50135395 IN IP4 10.61.10.228#015#012s=-#015#012c=IN IP4 10.61.10.228#015#012t=0 0#015#012m=audio 16450 RTP/SAVP 9 101#015#012a=rtpmap:9 G722/8000#015#012a=rtpmap:101 telephone-event/8000#015#012a=fmtp:101 0-15#015#012a=ptime:20#015#012a=sendrecv#015#012a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:pbhwzOthmYY0NvgUBF963jrmxpyc6r32RLKeZKBH#015#012", "ICE": "remove", "call-id": "4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060", "received-from": [ "IP4", "158.58.154.151" ], "from-tag": "as1575 ... Jan 19 09:19:57 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] ... b04b", "to-tag": "1ecd1416f6b73ec4i0", "command": "answer" } Jan 19 09:19:57 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] set FILLED flag for stream 10.61.10.228:16450 Jan 19 09:19:57 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] set FILLED flag for stream 10.61.10.228:16451 Jan 19 09:19:57 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] Replying to 'answer' from 158.58.152.19:41208 Jan 19 09:19:57 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] Response dump for 'answer' to 158.58.152.19:41208: { "sdp": "v=0#015#012o=- 50135395 50135395 IN IP4 10.61.10.228#015#012s=-#015#012c=IN IP4 158.58.154.97#015#012t=0 0#015#012m=audio 35054 RTP/AVP 9 101#015#012a=rtpmap:9 G722/8000#015#012a=rtpmap:101 telephone-event/8000#015#012a=fmtp:101 0-15#015#012a=ptime:20#015#012a=sendrecv#015#012a=rtcp:35055#015#012", "result": "ok" } Jan 19 09:20:01 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060 port 35054] Confirmed peer address as 158.58.154.47:19980 Jan 19 09:20:01 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060 port 35034] Confirmed peer address as 10.61.10.228:16450 Jan 19 09:20:01 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060 port 35034] Kernelizing media stream: 10.61.10.228:16450 Jan 19 09:20:01 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060 port 35054] Kernelizing media stream: 158.58.154.47:19980 Jan 19 09:20:01 rtptest2 systemd[1]: Starting Session 2 of user root. Jan 19 09:20:01 rtptest2 systemd[1]: Started Session 2 of user root. Jan 19 09:20:07 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060 port 35055] Confirmed peer address as 158.58.154.47:19981 Jan 19 09:20:22 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060 port 35054] SSRC changed, updating SRTP crypto contexts Jan 19 09:20:22 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060 port 35054] Kernelizing media stream: 158.58.154.47:19980 Jan 19 09:20:25 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060 port 35034] SSRC changed, updating SRTP crypto contexts Jan 19 09:20:25 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060 port 35034] Discarded invalid SRTP packet: authentication failed Jan 19 09:20:25 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] Received command 'delete' from 158.58.152.19:49233 Jan 19 09:20:25 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] Dump for 'delete' from 158.58.152.19:49233: { "call-id": "4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060", "received-from": [ "IP4", "158.58.154.151" ], "from-tag": "1ecd1416f6b73ec4i0", "command": "delete" } Jan 19 09:20:25 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] Scheduling deletion of call branch '1ecd1416f6b73ec4i0' (via-branch '') in 30 seconds Jan 19 09:20:25 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] Replying to 'delete' from 158.58.152.19:49233 Jan 19 09:20:25 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] Response dump for 'delete' to 158.58.152.19:49233: { "created": 1453191594, "last signal": 1453191597, "tags": { "1ecd1416f6b73ec4i0": { "tag": "1ecd1416f6b73ec4i0", "created": 1453191594, "in dialogue with": "as1575b04b", "medias": [ { "index": 1, "type": "audio", "protocol": "RTP/SAVP", "streams": [ { "local port": 35034, "endpoint": { "family": "IPv4", "address": "10.61.10.228", "port": 16450 }, "advertised endpoint": { "family": "IPv4", "address": "10.61.10.228", "port": 16450 }, "crypto sui ... Jan 19 09:20:25 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] ... te": "AES_CM_128_HMAC_SHA1_80", "last packet": 1453191625, "flags": [ "RTP", "filled", "confirmed", "kernelized" ], "stats": { "packets": 1370, "bytes": 235640, "errors": 0 } }, { "local port": 35035, "endpoint": { "family": "IPv4", "address": "10.61.10.228", "port": 16451 }, "advertised endpoint": { "family": "IPv4", "address": "10.61.10.228", "port": 16451 }, "crypto suite": "AES_CM_128_HMAC_SHA1_80", "last packet": 1453191594, "flags": [ "RTCP", "filled" ], "stats": { "packets": 0, "bytes": 0 ... Jan 19 09:20:25 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] ... , "errors": 0 } } ], "flags": [ "initialized", "send", "recv", "SDES" ] } ] }, "as1575b04b": { "tag": "as1575b04b", "created": 1453191594, "in dialogue with": "1ecd1416f6b73ec4i0", "medias": [ { "index": 1, "type": "audio", "protocol": "RTP/AVP", "streams": [ { "local port": 35054, "endpoint": { "family": "IPv4", "address": "158.58.154.47", "port": 19980 }, "advertised endpoint": { "family": "IPv4", "address": "158.58.154.47", "port": 19980 }, "last packet": 1453191625, "flags": [ "RTP", "filled ... Jan 19 09:20:25 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] ... ", "confirmed", "kernelized" ], "stats": { "packets": 1278, "bytes": 232596, "errors": 0 } }, { "local port": 35055, "endpoint": { "family": "IPv4", "address": "158.58.154.47", "port": 19981 }, "advertised endpoint": { "family": "IPv4", "address": "158.58.154.47", "port": 19981 }, "last packet": 1453191617, "flags": [ "RTCP", "filled", "confirmed" ], "stats": { "packets": 2, "bytes": 156, "errors": 0 } } ], "flags": [ "initialized", "send", "recv" ] } ] } }, "totals": { "RTP": { "packets": 2648, ... Jan 19 09:20:25 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] ... "bytes": 468236, "errors": 0 }, "RTCP": { "packets": 2, "bytes": 156, "errors": 0 } }, "result": "ok" } Jan 19 09:20:55 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] Call branch '1ecd1416f6b73ec4i0' (via-branch '') deleted, no more branches remaining Jan 19 09:20:55 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] Final packet stats: Jan 19 09:20:55 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] --- Tag 'as1575b04b', created 1:01 ago for branch '', in dialogue with '1ecd1416f6b73ec4i0' Jan 19 09:20:55 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] ------ Media #1 (audio over RTP/AVP) using G722/8000 Jan 19 09:20:55 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] --------- Port 35054 <> 158.58.154.47:19980, 1279 p, 232778 b, 0 e, 1453191625 last_packet Jan 19 09:20:55 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] --------- Port 35055 <> 158.58.154.47:19981 (RTCP), 2 p, 156 b, 0 e, 1453191617 last_packet Jan 19 09:20:55 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] --- Tag '1ecd1416f6b73ec4i0', created 1:01 ago for branch '', in dialogue with 'as1575b04b' Jan 19 09:20:55 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] ------ Media #1 (audio over RTP/SAVP) using G722/8000 Jan 19 09:20:55 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] --------- Port 35034 <> 10.61.10.228:16450, 1370 p, 235640 b, 0 e, 1453191625 last_packet Jan 19 09:20:55 rtptest2 rtpengine[2318]: [4c28096258c9f38937d48eb736a2ba85@158.58.154.47:5060] --------- Port 35035 <> 10.61.10.228:16451 (RTCP), 0 p, 0 b, 0 e, 1453191594 last_packet ~

Please let me know if you need anything more. I've hardcoded the starting seqno in Asterisk to a high value so it's easy to reproduce.

mtryfoss commented 8 years ago

Made a quick hack to Asterisk source to reset sequence to a random value when ssrc is changed. Now it seemed to work.

I'll try to get some logs from the Phones.

mtryfoss commented 8 years ago

It works for me if this block in rtp.c is commented out, but I'd guess this will break something else.

   if (G_UNLIKELY(!cur_ssrc)) 
            c->last_index = 0;
    else 
            c->last_index = cur_ssrc->index;
rfuchs commented 8 years ago

My first guess would be that Asterisk doesn't reset its SRTP ROC (roll-over counter) when changing the SSRC. RFC 3711 specifies that a change in SSRC results in a new SRTP crypto context, which would start with a ROC of 0.

I haven't dug into Asterisk code, but perhaps you can test it by making it set the ROC to 0 when SSRC changes. If it doesn't keep an explicit separate ROC, it's probably part of the sequence number (bits beyond bit 16).

mtryfoss commented 8 years ago

There's no SRTP between rtpengine and Asterisk. Only between rtpengine and the Cisco phone. I guess there's no ROC in use at the Asterisk side then?

The change in rtp.c broke SRTP for webrtc (chrome), but still works with Bria 4.

rfuchs commented 8 years ago

Oh sorry, had that backwards. Maybe a problem in the Cisco phone then. Not sure why changing the seqnr to a random value would fix the issue though. Does that work reliably or just occasionally?

Any info from the phone's log? Would it be possible to get a pcap of the (S)RTP?

mtryfoss commented 8 years ago

It seems to work reliably. My guess is that a change of more than a certain value of the seqnr triggers something in the phone.

I'll give you a pcap trace when I get back to work tomorrow. The phone doesn't have an internal log, but tried forwarding debug info to a syslog server. However, nothing related to SRTP there.

mtryfoss commented 8 years ago

Here is a pcap of a call where audio is being dropped. SSRC change happens at packet 3349/3350. The trace contains both RTP and SRTP (in and out of rtpengine).

rtp-notworking.pcap.zip

rfuchs commented 8 years ago

Do you happen to have the SRTP keys used for this session?

mtryfoss commented 8 years ago

Key from INVITE: a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:Utg7SwT8Lx/+QVTzW1a1twrqHv6WjdmtwYXkphpa Key from 200 OK: a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:4K4CoqTvcWNFSmmWBxsW/6aboKpVCyr7ipw83hB+

rfuchs commented 8 years ago

OK, so in this dump, the ROC was reset to zero when the SSRC changed. As the RFC is quite explicit about the SSRC being part of the crypto context which contains the ROC, I would say that this is correct behaviour by rtpengine and that the Cisco phone is at fault.

nakchak commented 8 years ago

Is this not the same as issue #71 ?

rfuchs commented 8 years ago

It's the reverse of #71. #71 was about receiving SRTP and has since been further fixed by tracking SSRC changes (it didn't do that before). This issue is about sending SRTP.

mtryfoss commented 8 years ago

I was afraid too that the error is in the phone. Too bad, since it's harder to get them fix it.

Regarding the seqno reset at sender side which seems to work around this. On a general basis, can you see any reason that this should not be done when SSRC is changed? I noticed one earlier rtpengine commit mentioning that asterisk does this change without reset.

rfuchs commented 8 years ago

I have no idea why Asterisk would change the SSRC but not the seq. Normally, a different SSRC points to a different media source which would use different seq numbers. I can't think of a reason why a single source would switch SSRC but retain its sequencing.

On the Cisco side, I have no idea why a change in seq would make it reset the ROC but not otherwise. It could be a workaround for a buggy SRTP source which now backfires.

Without any of this info, it's hard to implement a reliable workaround.

mtryfoss commented 8 years ago

I will do some more testing regarding the seq number change. However, it's also stupid of Asterisk to change SSRC when the source isn't really changing (from a sip perspective). I think they do it just to "be sure" or something like that.

mtryfoss commented 8 years ago

A little update. The audio only keeps working if sequence number is changed to something > 32500, which is in the upper half of the sequence range. Then this workaround is pretty useless.

nakchak commented 8 years ago

I think i have been experiencing the same problem, asterisk sends an invite midcall, which changes the SSRC and the client looses ability to carry on decrypting the SRTP stream.
Out of desparation i started fiddling with my asterisk config and seem to have resolved my issue (if its the same as this)... Out of interest @mtryfoss what is the session-timers property in your asterisk sip.conf for the peer that this is happening to? By changing it to refuse, instead of accept on the peer this was happening to the problem seems to have been resolved....

mtryfoss commented 8 years ago

This has nothing to do with session timers in our case since it's not in use. The problem seems to be that the Cisco phones doesn't reset the ROC on reinvite/ssrc change.

nakchak commented 8 years ago

Session timers were not in use for us either, but by explicitly disabling them it stopped asterisk changing the SSRC for me...