Open sfgeorge opened 4 years ago
Thanks for such a detailed report. Being extremely busy over these days, I unfortunately could not find the time to investigate the problem, but here are some comments in case they are of any help.
Apparently, something has changed internally in Asterisk between 1.8 and 13 versions in the way audio data is processed.
MPF callbacks in UniMRCP always carry audio data in 10 ms frames, irrespective of RTP ptime. In other words, if ptime is 20ms, then two frames are required to send an RTP packet out, and vice versa, an RTP packet of 20ms results in two frames of 10ms.
Given the logic above, 10ms frames are provided to Asterisk. It is up to Asterisk to compose an RTP stream for the incoming SIP leg based on negotiated parameters, including codec, ptime, etc, which does not seem to be properly reflected, based on your observations.
If you change an internal definition of CODEC_FRAME_TIME_BASE in mpf_codec_descriptor.h from 10 to 20, that would make the difference, I guess. However, this would not be a proper solution in general.
@achaloyan No need to apologize... that's excellent news -- my preliminary debugging was leaning me to the same conclusion and it's a great help to have some validation in that direction.
I noticed that mpf_codec_frame_samples_calculate() is calculated based on CODEC_FRAME_TIME_BASE. I will explore whether I can update mpf_codec_frame_samples_calculate() (and related functions) to be driven by the current ptime
rather than by CODEC_FRAME_TIME_BASE.
It's looking like there are 2 potential fixes here. I feel like both should eventually be implemented, but solving either one will likely resolve our immediate issue.
Ensure that Outbound channels are inheriting all of the appropriate settings (ptime and smoothing in particular) as Inbound channels do. This 10ms phenomenon for us only occurs on Outbound channels -- and a key difference I'm noticing is that Inbound channels are enabled to use Asterisk's smoother, whereas Outbound channels are following the branch of logic in which smoothing is not applied. I believe the smoother is applying the critical last-chance healing of placing those smaller frames into properly-sized ones.
Change UniMRCP's internal jitter-buffer globbling clock/interval from a hard set 10ms to the value of ptime on the receiving channel. There's no sense in doing work every 10ms when we only want to ship out a packet every 20ms (unless we are trying to correct some serious jitter).
I plan to propose these as 2 distinct PRs - so we can discuss whether or not 1 or both fixes are appropriate.
There are quite significant changes introduced in Asterisk 13 in the way media is handled internally and in the public API as well. It is not clear to me what is the key difference, though.
This is not as obvious as it may seem. Please note that RTP receiver should be capable of receiving packets even with different size (ptime) in the scope of the same RTP session, regardless what ptime is. Furthermore, according RFC 3264 ptime indicates a desired packetization interval that the offerer would like to receive.
Anyway, I am certainly open to discuss any suggestions you may have.
I've updated the description of this defect to note that it only occurs if you are using Asterisk's deprecated chan_sip channel driver; switching to chan_pjsip is one way to resolve the issue.
Well, thanks for the note.
Synopsis
Any currently available version of asterisk-unimrcp capable of being installed with Asterisk 13 produces RTP that violates the G.711 standard of 20ms packetization when TTS is sent on an outbound stream.
Asterisk/asterisk-unimrcp takes valid 20ms audio packets received from the TTS server and transcodes them into 10ms packets before sending them out.
Versions Tested
Requirements to Reproduce
TTS Server -> Asterisk/asterisk-unimrcp Audio Stream
Above: TTS Server -> Asterisk/asterisk-unimrcp Audio Stream with proper 20ms packets and negligible sub-millisecond jitter.
Same Audio Stream transcoded and sent-out by Asterisk/asterisk-unimrcp
Above: Asterisk/asterisk-unimrcp -> Outbound Audio Stream transitioning to improper 10ms packets and high jitter when TTS is sent out.
Configuration
mrcp.conf
Note: I've also tried with the following settings appended to the [tts-mrcp1] section, with no change in behavior
extensions.conf
Inbound calls are configured to route to
[corrupted-audio-reproducible-case]
Notes
With verbose RTP debugging in Asterisk enabled (
sudo asterisk -rx 'rtp set debug on'
), it is clear to see that Asterisk transitions from ptime 20ms packets (size 160) to ptime 10ms packets (size 80) while proxying/transcoding TTS: