sipwise / rtpengine

The Sipwise media proxy for Kamailio
GNU General Public License v3.0
783 stars 368 forks source link

telephone-event digits are scrambled when its clock rate differs from audio clock rate #1530

Open jpyle490 opened 2 years ago

jpyle490 commented 2 years ago

This is on rtpengine 10.5.1.3, compiled into Debian packages, running on an up-to-date Debian 11. This may be related to issue #1136.

DTMF digits sent via RFC2833-style telephone-events are scrambled on the way out if the telephone-event clock rate and the audio codec clock rate are different. It doesn't matter whether it's in-kernel or userspace forwarding.

I'm testing from a Polycom VVX, which is sending only telephone-event/8000 on 101. If the audio codec is also 8000, DTMF is relayed perfectly (tested PCMU/8000, G722/8000). If the audio codec is not 8000, the events emerge jumbled (tested G7221/16000, G7221/32000).

rfuchs commented 2 years ago

Can you provide a pcap or some other steps to reproduce?

I'm not sure if it's possible to handle this because this behaviour goes against the RFC, which suggests that the telephone-event clock rate should match the clock rate of the audio codec it belongs to.

jpyle490 commented 2 years ago

Below are the SDPs with the IPs obfuscated.

The originating UA is a Polycom VVX v5.9.6 at 192.168.100.100. The proxy managing rtpengine is OpenSIPS 3.2.8, with rtpengine at 22.33.44.55 in this example. The terminating UA is FreeSWITCH 1.10.7 at 66.77.88.99 here.

First, a call with no problems, negotiated PCMU and relayed DTMF perfectly.

Original INVITE's SDP from Polycom:

v=0
o=- 1661109539 1661109539 IN IP4 192.168.100.100
s=Polycom IP Phone
c=IN IP4 192.168.100.100
t=0 0
a=sendrecv
m=audio 16404 RTP/AVP 115 102 9 0 110 18 101
a=rtpmap:115 G7221/32000
a=fmtp:115 bitrate=48000
a=rtpmap:102 G7221/16000
a=fmtp:102 bitrate=32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:110 iLBC/8000
a=fmtp:110 mode=30
a=rtpmap:18 G729/8000
a=fmtp:18 annexb=no
a=rtpmap:101 telephone-event/8000

INVITE SDP relayed through OpenSIPS with rtpengine:

v=0
o=- 1661109539 1661109539 IN IP4 22.33.44.55
s=Polycom IP Phone
c=IN IP4 22.33.44.55
t=0 0
m=audio 16584 RTP/AVP 115 102 9 0 110 18 101
a=rtpmap:115 G7221/32000
a=fmtp:115 bitrate=48000
a=rtpmap:102 G7221/16000
a=fmtp:102 bitrate=32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:110 iLBC/8000
a=fmtp:110 mode=30
a=rtpmap:18 G729/8000
a=fmtp:18 annexb=no
a=rtpmap:101 telephone-event/8000
a=sendrecv
a=rtcp:16585

200 OK from FreeSWITCH:

v=0
o=supermario 1661084393 1661084394 IN IP4 66.77.88.99
s=supermario
c=IN IP4 66.77.88.99
t=0 0
m=audio 25146 RTP/AVP 0 101
a=rtpmap:0 PCMU/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-15
a=silenceSupp:off - - - -
a=ptime:20
a=rtcp:25147 IN IP4 66.77.88.99

200 OK relayed to Polycom via OpenSIPS with rtpengine:

v=0
o=supermario 1661084393 1661084394 IN IP4 22.33.44.55
s=supermario
c=IN IP4 22.33.44.55
t=0 0
m=audio 16602 RTP/AVP 0 101
a=silenceSupp:off - - - -
a=rtpmap:0 PCMU/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-15
a=sendrecv
a=rtcp:16603
a=ptime:20

Next, a call that negotiated G7221/32000, which had clean audio but jumbled DTMF relay from the Polycom to FreeSWITCH.

Original INVITE's SDP from Polycom:

v=0
o=- 1661109062 1661109062 IN IP4 192.168.100.100
s=Polycom IP Phone
c=IN IP4 192.168.100.100
t=0 0
a=sendrecv
m=audio 16400 RTP/AVP 115 102 9 0 110 18 101
a=rtpmap:115 G7221/32000
a=fmtp:115 bitrate=48000
a=rtpmap:102 G7221/16000
a=fmtp:102 bitrate=32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:110 iLBC/8000
a=fmtp:110 mode=30
a=rtpmap:18 G729/8000
a=fmtp:18 annexb=no
a=rtpmap:101 telephone-event/8000

INVITE SDP relayed through OpenSIPS with rtpengine:

v=0
o=- 1661109062 1661109062 IN IP4 22.33.44.55
s=Polycom IP Phone
c=IN IP4 22.33.44.55
t=0 0
m=audio 16560 RTP/AVP 115 102 9 0 110 18 101
a=rtpmap:115 G7221/32000
a=fmtp:115 bitrate=48000
a=rtpmap:102 G7221/16000
a=fmtp:102 bitrate=32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:110 iLBC/8000
a=fmtp:110 mode=30
a=rtpmap:18 G729/8000
a=fmtp:18 annexb=no
a=rtpmap:101 telephone-event/8000
a=sendrecv
a=rtcp:16561

200 OK from FreeSWITCH:

v=0
o=supermario 1661080744 1661080745 IN IP4 66.77.88.99
s=supermario
c=IN IP4 66.77.88.99
t=0 0
m=audio 28318 RTP/AVP 115 101
a=rtpmap:115 G7221/32000
a=fmtp:115 bitrate=48000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-15
a=silenceSupp:off - - - -
a=ptime:20
a=rtcp:28319 IN IP4 66.77.88.99

200 OK relayed to Polycom via OpenSIPS with rtpengine:

v=0
o=supermario 1661080744 1661080745 IN IP4 22.33.44.55
s=supermario
c=IN IP4 22.33.44.55
t=0 0
m=audio 16574 RTP/AVP 115 101
a=silenceSupp:off - - - -
a=rtpmap:115 G7221/32000
a=fmtp:115 bitrate=48000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-15
a=sendrecv
a=rtcp:16575
a=ptime:20

I also tested with G722/8000 and G7221/16000. G722/8000 worked even though it's really 16000 (thanks RFC 1890), but G7221/16000 did not. All cases had telephone-event/8000.

I can provide the full pcaps to you privately if that's helpful.

jpyle490 commented 2 years ago

@rfuchs Is the above SDP info enough to be useful, or do you require the full pcaps? Are there any additional debugs I should be looking at within the rtpengine daemon?

rfuchs commented 2 years ago

The pcaps with the RTP would be most helpful. You can send to me by email. But as I said before, I'm not sure how actionable this is as it's not RFC compliant.

jpyle490 commented 2 years ago

Understood, and accepted. I appreciate your taking a look. Having said that, I wouldn't complain if you could point me to the relevant RFC sections.

rfuchs commented 2 years ago

RFC 4733

2.  RTP Payload Format for Named Telephone Events

2.1.  Introduction

   The RTP payload format for named telephone events is designated as
   "telephone-event", the media type as "audio/telephone-event".  In
   accordance with current practice, this payload format does not have a
   static payload type number, but uses an RTP payload type number
   established dynamically and out-of-band.  The default clock frequency
   is 8000 Hz, but the clock frequency can be redefined when assigning
   the dynamic payload type.

   Named telephone events are carried as part of the audio stream and
   MUST use the same sequence number and timestamp base as the regular
   audio channel to simplify the generation of audio waveforms at a
   gateway.  The named telephone-event payload type can be considered to
   be a very highly-compressed audio codec and is treated the same as
   other codecs.

Emphasis on: MUST use the same ... timestamp base as the regular audio channel

jpyle490 commented 2 years ago

Very interesting and helpful.

I checked the configuration files in the distribution of the Polycom VVX 5.9.6 I'm working with, and there's no mention of 4733, only 2833. In 4733's abstract it says that it "obsoletes RFC 2833", so I was curious if 2833 said something to the contrary. Spoiler: nope, it's the same.

3 RTP Payload Format for Named Telephone Events

3.1 Introduction

   <chop>

   DTMF digits and named telephone events are carried as part of the
   audio stream, and MUST use the same sequence number and time-stamp
   base as the regular audio channel to simplify the generation of audio
   waveforms at a gateway. The default clock frequency is 8,000 Hz, but
   the clock frequency can be redefined when assigning the dynamic
   payload type.

The release notes for Polycom's 5.9.5 software introduce the "correct" behavior for the Opus codec only:

Poly UC Software 5.9.5 introduces a new parameter for Dual-Tone Multi-Frequency (DTMF) to publish
the DTMF frequency on the Opus codec.

tone.dtmf.rfc2833.SupportOpusClockRate

  1 – (default) Publishes the Telephone-event DTMF frequency as 48000 Hz along with 8000 Hz on Opus codec.
  0 - Publishes the Telephone-event DTMF frequency as 8000 Hz on Opus codec.

I wonder if Polycom is alone in this out-of-spec behavior. Erring on the side of no, if rtpengine could accommodate it conveniently even though it is out of spec, that would be helpful. If not, it's certainly understandable.

I can't shake the feeling I'm missing something here. Why would Polycom do this, and why would (at least in my case) FreeSWITCH accept it without question? This almost smells like another G722 situation where it was mistakenly tagged with an 8khz clock rate in an early RFC and it had to stay since it had already been implemented.

jpyle490 commented 2 years ago

I've been doing more testing with this. On a SIP peer inside Asterisk 16's chan_sip module with the following configuration:

disallow=all
allow=opus
allow=siren14
allow=siren7
allow=g722
allow=ulaw

It generated the following SDP:

v=0
o=asterisk 1403095813 1403095813 IN IP4 w.x.y.z
s=Asterisk PBX 16.16.1~dfsg-1+deb11u1
c=IN IP4 w.z.y.z
t=0 0
m=audio 28238 RTP/AVP 107 115 102 9 0 101
a=rtpmap:107 opus/48000/2
a=rtpmap:115 G7221/32000
a=fmtp:115 bitrate=48000
a=rtpmap:102 G7221/16000
a=fmtp:102 bitrate=32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=ptime:20
a=maxptime:60
a=sendrecv

It also doesn't provide a telephone-event rtpmap option for each offered clock rate.

But then, Freeswitch 1.10. With absolute_codec_string=opus@20i,G7221@32000h@20i,G7221@16000h@20i,G722@20i,PCMU@20i it produced the following SDP:

v=0
o=freeswitch 1661877427 1661877428 IN IP4 w.x.y.z
s=freeswitch
c=IN IP4 w.x.y.z
t=0 0
m=audio 19040 RTP/AVP 102 103 104 9 0 105 107 109 101
a=rtpmap:102 opus/48000/2
a=fmtp:102 useinbandfec=1; maxaveragebitrate=30000; maxplaybackrate=48000; ptime=20; minptime=10; maxptime=40
a=rtpmap:103 G7221/32000
a=fmtp:103 bitrate=48000
a=rtpmap:104 G7221/16000
a=fmtp:104 bitrate=32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:105 telephone-event/48000
a=fmtp:105 0-15
a=rtpmap:107 telephone-event/32000
a=fmtp:107 0-15
a=rtpmap:109 telephone-event/16000
a=fmtp:109 0-15
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-15
a=silenceSupp:off - - - -
a=ptime:20

I saw similar, RFC-compliant results from Zoiper5. But two soft clients on my Android phone, Counterpath Bria and Acrobits Groundwire, both sent only telephone-event/8000 even though both sent multiple codecs at multiple clock rates.

I don't know. This is furthering my paranoia that there's something else at play here.

jpyle490 commented 2 years ago

This rabbit hole is getting deep quickly.

Oracle (ex-Acme) SBC documentation says:

RFC 4733 recommends that telephone events within an audio stream that use the same synchronization source (SSRC) should use the same timestamp clock rate as the audio channel...

Recommends, not requires. Hmm.

And everyone's favorite SIP implementer, Microsoft. From an open standards doc of theirs:

Out-of-band negotiation of telephony signal information is required to establish a session as specified in [RFC4733]. During this negotiation, both payload types and the clock rate of the telephony signals are negotiated as specified in [RFC4733] section 2.5.1.1 using SDP for out-of-band negotiation. While dynamic payload type binding is required, both the sender and receiver of message blocks conforming to this protocol MUST fix the telephony signaling information at 8000 Hertz. Dynamic negotiation of the clock frequency of the DTMF payload MUST NOT be used.

Microsoft does some interesting things, but I don't often see them completely contrary to an RFC.

Just for kicks I sent myself an INVITE via Teams Direct Routing:

v=0
o=- 180025 0 IN IP4 127.0.0.1
s=session
c=IN IP4 52.113.218.39
b=CT:10000000
t=0 0
m=audio 49616 RTP/SAVP 104 9 103 111 18 0 8 97 101 13 118
c=IN IP4 52.113.218.39
a=rtcp:49617
a=ice-ufrag:Codc
a=ice-pwd:xxxx
a=rtcp-mux
a=candidate:1 1 UDP 2130706431 52.113.218.39 49616 typ srflx raddr 10.0.36.47 rport 49616
a=candidate:1 2 UDP 2130705918 52.113.218.39 49617 typ srflx raddr 10.0.36.47 rport 49617
a=candidate:2 1 tcp-act 2121006078 52.113.218.39 49152 typ srflx raddr 10.0.36.47 rport 49152
a=candidate:2 2 tcp-act 2121006078 52.113.218.39 49152 typ srflx raddr 10.0.36.47 rport 49152
a=label:main-audio
a=mid:1
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:xxxx|2^31
a=sendrecv
a=rtpmap:104 SILK/16000
a=rtpmap:9 G722/8000
a=rtpmap:103 SILK/8000
a=rtpmap:111 SIREN/16000
a=fmtp:111 bitrate=16000
a=rtpmap:18 G729/8000
a=fmtp:18 annexb=no
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:97 RED/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=rtpmap:13 CN/8000
a=rtpmap:118 CN/16000
a=ptime:20

They're clearly sending telephone-event at only 8 KHz. There's got to be more to this.

jpyle490 commented 2 years ago

@rfuchs I wanted to check to see if you had any further thoughts.

Regards, Jeff