twilio / video-quickstart-ios

Twilio Video Quickstart for iOS
https://www.twilio.com/docs/api/video
MIT License
460 stars 177 forks source link

Question: how well does echo cancellation work when sound comes from speaker? #522

Closed winstondu closed 3 years ago

winstondu commented 4 years ago

Description

When the AVAudioSession mode is .voiceChat or .videoChat, it seems that Apple aggressively ducks non-voice audio by moving all output audio from the phone's speaker to the phones ear speaker (aka earpiece).

To work around that, our team has elected to set the AVAudioSession mode to be .default on iOS, with the options .mixWithOthers and .defaultToSpeaker, even when the category is .playAndRecord. This enforces continued use of the phone's actual speaker when it comes to output sound.

However, there does not appear to be acoustic echo cancellation.

We noticed that there is an ExampleAVAudioEngineDevice provided in this guide that appears to suggest that feeding data into the iOS VoiceProcessingIO unit does the trick for enabling acoustic echo cancellation. However, in that code, it is not apparent if the echo cancellation would still work when the line setMode:AVAudioSessionModeVideoChat is changed to setMode:AVAudioSessionModeDefault. In other words, the example still seems to switch on iOS's default behavior of using the earpiece and/or turning down speaker volume.

I want to know if that architecture still works for echo cancellation on IPhone XR when the speaker audio is on full blast

In other words, I hope for a clarification of the documentation: by acoustic echo cancellation, do you mean that when Person A and Person B are in the same chat room (but not same physical location) and the Person A's iPhone speaker is playing on full blast, the voice of Person B would still not echo back?

Steps to Reproduce

  1. Turn on voice sdk and enter a chat room with another person.
  2. ensure AVAudioSession category is .playAndRecord, and .default is the mode, with the option .defaultToSpeaker.

Code

// Code that helps reproduce the issue

Expected Behavior

No echo.

Actual Behavior

Significant echo from the chat.

Xcode

11

iOS Version

iOS 13.4, iOS 13.5, iOS 13.6

iOS Device

iPhone XR

winstondu commented 4 years ago

@piyushtank

piyushtank commented 4 years ago

Your observation is correct. When VoiceProcessingIO is used with voiceChat or videoChat, iOS uses the built-in hardware echo cancellation. And also iOS ducks the volume of the audio played from other resources (say an audio file).

A workaround to this problem is solution is - you can mix audio from the file, and audio from remote participant, and supply it to AVAudioEngine to play it out.

If you find above workaround bit complex, and playing out audio at the same level is a strict requirement for your app, when you use non voiceChat/videoChat, you can use software AEC. Please note software AEC will cause bit higher CPU usage and it should be kept as the last option.

winstondu commented 4 years ago

@piyushtank

Follow-up question

Are you aware of any workaround to disable ducking on external mixed audio (e.g. audio from other apps, where we have .mixWithOthers set) when our app's mic is on, but still benefit from Voice-Processing IO's echo cancellation?

In other words, users can continue being on the twilio call while the hosting app is in the background, while getting full speaker volume for the foreground app (e.g. a youtube video playing from Safari).

Right now, we have the trade-off of not enabling the acoustic audio suppression, or dealing with audio ducking for speaker output to any foreground app.

piyushtank commented 4 years ago

@winstondu Yes, you can try these workarounds. In fact, the first workaround getting audio from AVPlayer using MTAudioProcessingTap, and mix it with the remote participant's audio and then giving it to VPIO was recommended by Apple engineers to us in audio lab @ wwdc. I should mention that we haven't tried these workaround yet, but they should give you the intended outcome.

winstondu commented 4 years ago

@piyushtank , thanks for the prompt reply (glad for the support!), but I'm afraid I did not communicate my follow-up question clearly.

Your answers describe how to avoid audio-ducking on the HOST apps music that is not the twilio voice output. That is useful, but not what I am asking here.

My question is if you are able to avoid audio-ducking ACROSS the iOS system.

To explain, we are try to allow users to watch videos on a different app (say the Youtube app, or the Netflix app), while continuing to chat on our host app in the background. We noticed that our host app's VPIO was causing audio ducking on Youtube app's sound output.

piyushtank commented 4 years ago

@winstondu Thanks for the clarification.

Are you able to get the audio volume at the expected level when the mode is set to .default ? If so, to have echo cancellation, you can use software AEC. Let me know how it goes.

winstondu commented 4 years ago

@piyushtank yes, I am able to get the audio volume to full speaker when using .default (which happens to remove VPIO, and hence the ducking).

So are you saying that there is no solution other than software AEC to this inter-app audio-ducking problem?

piyushtank commented 4 years ago

@winstondu Correct. I am not aware of any solution other than using our software AEC APIs. I would keep an eye on CPU utilization when software AEC is used as the software AEC APIs are not widely used and exercised by our QE team and our customers. The main purpose behind adding the software AEC apis was to workaround AEC issues in specific iOS version+device combination when apple releases new iOS and/or device (example).

winstondu commented 4 years ago

I see. Thank you for the answers.

Would you mind referring me to some sample code on how to change the TVIOptions?

(Note: it is amazing how Apple offers no solution to this problem, and moreover even made audio-ducking in-app so difficult. I noticed that folks found some crazy C-code to attach to LLDB to disable audio-ducking for Mac chat apps. I had hoped maybe something like this existed for iOS)

piyushtank commented 4 years ago

We don't have a sample app for AudioOptions settings, but following is the code snippet -

let audioOptions = AudioOptions { (options) in
    options.isSoftwareAecEnabled = true
}

localAudioTrack = LocalAudioTrack(options: audioOptions, enabled: true, name: "Microphone")

let connectOptions = ConnectOptions(token: accessToken) { (builder) in
    // .. configure other connect options
    builder.audioTracks = [localAudioTrack]
    ... 
}

room = TwilioVideoSDK.connect(options: connectOptions, delegate: self)

We have filed a feature request to Apple for this feature. Hopefully they will prioritize and provide a solution.

winstondu commented 4 years ago

"getting audio from AVPlayer using MTAudioProcessingTap"

I figured out how to get the audio, but the process callback is of type:

(MTAudioProcessingTap, CMItemCount, MTAudioProcessingTapFlags, UnsafeMutablePointer<AudioBufferList>, UnsafeMutablePointer<CMItemCount>, UnsafeMutablePointer<MTAudioProcessingTapFlags>)

How can I, using these parameters, even generate the AVAudioPCMBuffer to feed back to my app's AVAudioEngine?

Edit, I figured it out. Here is how (for anyone interested)

    // use this in the tap's process callback
    func processAudioDataAsPCM(audioData: UnsafeMutablePointer<AudioBufferList>, framesNumber: UInt32) {
      // Retrieve the `AudioStreamBasicDescription` object the tap created and saved in the prepare callback.
        guard let absd = self.audioProcessingFormat else { 
            return
        }

        guard let format = AVAudioFormat(standardFormatWithSampleRate: absd.mSampleRate, channels: absd.mChannelsPerFrame) else {
            return
        }

        guard let pcmBuffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: AVAudioFrameCount(framesNumber)) else {
            return
        }

        pcmBuffer.frameLength = pcmBuffer.frameCapacity

        let audioDataUnsafe = UnsafeMutableAudioBufferListPointer(audioData)
        let audioBufferUnsafe = UnsafeMutableAudioBufferListPointer(pcmBuffer.mutableAudioBufferList)

        // Copy the audio data from AVPlayer output into the PCM Buffer
        // Silence the actual AVPlayer output
        for i in 0..<audioDataUnsafe.count {
            // Copy
            memcpy(audioBufferUnsafe[i].mData, audioDataUnsafe[i].mData, Int(audioBufferUnsafe[i].mDataByteSize))
            // Silence. Comment this line out if you don't want to silence the original AVPlayer output.
            memset(audioDataUnsafe[i].mData, 0, Int(audioBufferUnsafe[i].mDataByteSize))
        }

        self.delegate?.playAudioSample(sampleBuffer: pcmBuffer)   // The delegate is anything that can play a PCMBuffer, e.g. the AVAudioPlayerNode
    }
piyushtank commented 3 years ago

Closing the ticket but feel free to reach out to us if you have any questions.

Thanks, Piyush