Access Raw Buffers from TVIAudioTrack

moarzumanov commented 7 years ago

Is there any way to get AVAssetTrack from TVIAudioTrack?

piyushtank commented 7 years ago

@moarzumanov Can you provide more details on your use case? TVIAudioTrack represents the audio being shared when you are connected to a Room.

moarzumanov commented 7 years ago

@ptankTwilio, hallo

I want to implement speech recognition in my app to make subtitles. I need AVAudioPCMBuffer for SFSpeechAudioBufferRecognitionRequest.

williamcaruso commented 7 years ago

@ptankTwilio @moarzumanov I am also trying to implement speech recognition in my app using SFSpeechRecognizer in iOS 10. I'm not sure how to convert the TVIAudioTrack to AVAudio either. Is this even possible?

Transcribing without Twilio looks like this:

These are the variables we need to do recognition on AVAudio

let audioEngine = AVAudioEngine() // REPLACE THIS WITH TWILocalAudioTrack let speechRecognizer: SFSpeechRecognizer? = SFSpeechRecognizer() let request = SFSpeechAudioBufferRecognitionRequest() var recognitionTask: SFSpeechRecognitionTask?

Then we need to tap into the audio input node and hook the buffer up to the SFSpeechAudioBufferRecognitionRequest

guard let node = audioEngine.inputNode else { return } // need input node from TWIAudioTrack let recordingFormat = node.outputFormat(forBus: 0) node.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { buffer, _ in
self.request.append(buffer) }

Let us know what you think!

ceaglest commented 7 years ago

Hi @williamcaruso,

I agree, in the end you want to use a SFSpeechAudioBufferRecognitionRequest fed with audio content from TVILocalAudioTrack. The problem is our current APIs don't offer access to the raw audio buffers which need to be appended to the request. It may be possible to run a parallel AVAudioEngine, but I wouldn't recommend it as real-time audio playback and recording requires precise configuration (buffer sizes, sample rates, and more), and the setups might conflict with each other.

We are very close to moving out of beta, and releasing 1.0. After this is complete we will consider adding the ability to access raw audio samples much in the same way as we do with TVIVideoRenderer and video frames.

Best, Chris Eagleston

williamcaruso commented 7 years ago

Hey @ceaglest

Thanks for the reply - that's a bummer. Hopefully it'll be in an update soon. Allowing people to do audio processing would be huge.

Do you know of any video chatting API's in Swift that give access to the buffer? I need this for a research project, so I'll try running the AVAudioEngine in parallel first.

ceaglest commented 7 years ago

Hi @williamcaruso,

We are live with 1.0 now, and our team is looking forward to working on several audio related feature requests. I will keep this ticket up to date as we make progress.

Do you know of any video chatting API's in Swift that give access to the buffer? I need this for a research project, so I'll try running the AVAudioEngine in parallel first.

I would give AVAudioEngine a shot. Recommending competitors is difficult for me, but you might find something out there that will let you replace the audio device.

Cheers, Chris

williamcaruso commented 7 years ago

AVAudioEngine cannot be used at the same time. Replacing the audio file is the only option.

The following error is thrown:

[Utility] +[AFAggregator logDictationFailedWithError:] Error Domain=kAFAssistantErrorDomain Code=1700 "(null)"
Error Domain=kAFAssistantErrorDomain Code=1700 "(null)"

ceaglest commented 7 years ago

Thanks for the update, we are actively working on adding support for accessing raw sample buffers from TVIAudioTrack.

ceaglest commented 7 years ago

For the folks who are looking for this feature, i'm curious how you might want to consume the audio buffers. What I'm currently thinking is exposing audio samples like:

@interface TVIAudioTrack : TVITrack

// ... other methods elided..
- (void)addRenderer:(id<TVIAudioRenderer>)renderer;
- (void)removeRenderer:(id<TVIAudioRenderer>)renderer;

@end

@protocol TVIAudioRenderer <NSObject>

// Called periodically when new audio samples become ready.
- (void)renderAudioFrame:(CMSampleBufferRef)audioFrame;

@end

This would be compatible with a wide range of iOS audio APIs, including SFSpeechAudioBufferRecognitionRequest, AVAssetWriterInput, AudioQueue and others. I would like to construct the buffers internally using CMAudioSampleBufferCreateReadyWithPacketDescriptions. One problem I'm seeing at the moment is that WebRTC doesn't provide presentation timestamps. So, we may need to either compute them ourselves or expose them from lower levels.

Another option would be to define a custom data structure to represent audio samples. I'm not as keen on this because it means that the samples are not directly usable by the developer without some sort of conversion.

Edit: We are considering a different protocol name like TVIAudioSink or TVIAudioConsumer but the API should remain as proposed.

williamcaruso commented 7 years ago

Hey @ceaglest - glad you're working on it! (or at least thinking about it)

Ideally, I'd like to make it as easy as possible to use iOS audio API's. Having access to the audio buffers would be huge - so your first suggestion looks good to me. How would one be able to use this, lets say, with SFSpeechAudioBufferRecognitionRequest? Also, do you foresee any other technical challenges besides time stamps?

I think a custom data structure would be more prone to developer frustration.

ceaglest commented 7 years ago

Hey @williamcaruso,

Thanks for the feedback. At this point I've done some research into the lower level workings of WebRTC, and surveyed the use cases mentioned in this thread (as well as individual support requests). Up next is a proof of concept implementation that can satisfy these use cases. Some problems / questions on my side:

Will local audio capture actually start when a renderer is added? Typically no samples are produced until the first PeerConnnection is opened, not when the TVILocalAudioTrack is created.
Timing information for remote audio. My understanding is that the samples raised from WebRTC are for the individual audio streams, and not the mixed down or jitter corrected version that is fed into the AudioDevice for playback. This is probably a good thing, but I still don't understand what happens when samples are dropped at the RTP layer before hitting the decoder (are they filled with silence?).
TVIVideoRenderer raises frames at presentation time, but can we make the same guarantees about TVIAudioRenderer?

A little more experimentation is required here, but it should be possible to get something working.

How would one be able to use this, lets say, with SFSpeechAudioBufferRecognitionRequest?

If the audio buffers are raised as CMSampleBufferRefs then they can be fed directly into the request using appendAudioSampleBuffer:. The expectation is that TVIAudioRenderer will raise the samples in order with an ever advancing timestamp.

I'm happy to keep the discussion going if anyone else has feedback, but for now I'm going to proceed with the CMSampleBuffer based PoC.

Best, Chris Eagleston

theprojectabot commented 7 years ago

I like the cmsamplebuffer based PoC route. On Mon, May 15, 2017 at 4:18 PM Christopher Eagleston < notifications@github.com> wrote:

Hey @williamcaruso https://github.com/williamcaruso,

Thanks for the feedback. At this point I've done some research into the lower level workings of WebRTC, and surveyed the use cases mentioned in this thread (as well as some individual support requests). Up next is a proof of concept implementation that can satisfy these use cases. Some problems / questions on my side:

Will local audio capture actually start when a renderer is added? Typically no samples are produced until the first PeerConnnection is opened, not when the TVILocalAudioTrack is created.

Timing information for remote audio. My understanding is that the samples raised from WebRTC are for the individual audio streams, and not the mixed down / jitter corrected version that is fed into the AudioDevice for playback. This is probably a good thing, but I still don't understand what happens when samples are dropped at the RTP layer before hitting the decoder (are they filled with silence?).

A little more experimentation is required here, but it should be possible to get something working.

How would one be able to use this, lets say, with SFSpeechAudioBufferRecognitionRequest?

If the audio buffers are raised as CMSampleBufferRefs then they can be fed directly into the request using appendAudioSampleBuffer:. The expectation is that TVIAudioRenderer will raise the samples in order with an ever advancing timestamp.

I'm happy to keep the discussion going if anyone else has feedback, but for now I'm going to proceed with the CMSampleBuffer based PoC.

Best, Chris Eagleston

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/twilio/video-quickstart-swift/issues/104#issuecomment-301631596, or mute the thread https://github.com/notifications/unsubscribe-auth/AAPHSJkiJlx0Tbn0YN1HoGFMAk6BBjkeks5r6N1UgaJpZM4M2Sir .

williamcaruso commented 7 years ago

Hey @ceaglest

In regards to 1: Interesting. Would it be possible to start audio capture if only a single person is in a chat? This would be useful.

Also, if you don't mind me asking, what is your expected timeline for this feature?

ceaglest commented 7 years ago

Hi @williamcaruso,

In regards to 1: Interesting. Would it be possible to start audio capture if only a single person is in a chat? This would be useful.

I agree, and will try to achieve this behavior if possible.

Also, if you don't mind me asking, what is your expected timeline for this feature?

We will do our best to ship this next week, but things are pretty hectic with the Signal 2017 conference. I will keep this ticket up to date with the latest info.

Regards, Chris

MarcAMartin commented 7 years ago

Any updates here?

ceaglest commented 7 years ago

Nothing substantial yet @MarcAMartin, I take it you are also interested in this feature?

MarcAMartin commented 7 years ago

@ceaglest It's kinda mandatory for us to continue using Twilio for our new app :(

williamcaruso commented 7 years ago

How will the new Speech Recognition Framework (Beta) announced at Signal affect this ticket? More info here...

williamcaruso commented 7 years ago

Any updates? 😢 @ceaglest

ceaglest commented 7 years ago

Hi @williamcaruso,

I wasn't able to finish this feature during the week of Signal and was on vacation until today. I now have most of the implementation done, and will proceed with internal reviews and testing. I expect that we will release this feature next week, but if you'd like early access to a release candidate that can be arranged.

How will the new Speech Recognition Framework (Beta) announced at Signal affect this ticket?

The Speech Recognition Framework is specific to Programmable Voice for now. In a Voice Call media flows through a central server, but for Video this is not always the case. This type of server side / TwiML integration won't be possible for Peer-to-Peer Rooms, but could be added to Group Rooms in the future. I don't have a timeline on when server driven Speech Recognition will be available for Programmable Video, but I will enquire with our product leads on where this lies in our roadmap.

As for this feature, we are just aiming to provide access to audio samples. Whether they are recorded to disk, or used with a speech recognition tool like Apple's Speech framework is entirely up to the developer.

Best, Chris Eagleston

lilhinx commented 7 years ago

@ceaglest I'd love early access to the candidate for this feature.

theprojectabot commented 7 years ago

I would love early access as well!

On Thu, Jun 8, 2017 3:16 PM, Chris Hinkle notifications@github.com wrote: @ceaglest I'd love early access to the candidate for this feature.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Benji BrownStudio B Flat Games@theprojectabotwww.facebook.com/studiobflat http://www.linkedin.com/pub/benji-brown/10/951/765

ceaglest commented 7 years ago

@lilhinx @theprojectabot can you send an email to githubusername + on + @twilio.com?

williamcaruso commented 7 years ago

Hi! @ceaglest Can I also receive early access?

ceaglest commented 7 years ago

Sure @williamcaruso, can you email me mentioning this issue?

mygithubusername + on + @twilio.com.

I will do my best to share a release candidate build tomorrow.

ceaglest commented 7 years ago

Hi @lilhinx, @theprojectabot, and @williamcaruso,

Thank you for your interest in this feature, and discussion on the API proposal. At long last, I have a release candidate (RC) to share with you.

Changelog

1.2.0

Features

Added the ability to access raw audio samples using the TVIAudioSink protocol. At the moment this is only supported on remote TVIAudioTrack instances. #104

Known issues

Audio playout and recording does not work correctly on iOS simulators. #73
Participant disconnect event can take up to 120 seconds to occur. #99
AVPlayer audio content does not mix properly with Room audio. #62

Using the Release Candidate

Please check your inboxes for a link to the RC. I've attached an example Speech Recognizer, and Audio Recorder to this ticket. These classes demonstrate basic usage of the TVIAudioSink APIs.

TVIAudioSink Examples.zip

Discussion

TL;DR - Low level audio is difficult, Speech.framework is immature and WebRTC doesn't raise recorded audio yet.

You might be wondering why this feature took so long to deliver, so let me explain some of the challenges that we faced.

Real-time audio interaction

What I didn't realize initially is that the audio samples are delivered by WebRTC on a real-time Core Audio capture / playback thread. Every time the audio thread pulls new playback samples from the mixer we get access to them. Unfortunately, operating on a real-time thread presents several constraints since memory should not be allocated nor locks taken.

In the end I did my best to reduce but not entirely eliminate allocations caused by adding a TVIAudioSink. We have cut another ticket to pre-allocate the audio CMBlockBuffers as this is the main area where we are getting hit in 1.2.0.

In order to raise the TVIAudioSink callbacks to the developer, we dispatch them to a serial queue. This means that developers don't have to worry about the constraints of a real-time audio thread.

Supporting Speech.Framework

A common use case which has been requested is speech recognition via Speech.framework. Again, this was something we hoped would just work out of the box, but it was not so.

After getting this feature up and running, I tried to integrate with SFSpeechAudioBufferRecognitionRequest. Initially, Speech simply retained the buffers forever and used massive amounts of CPU.

In the end I tracked this down to a missing format flag in the AudioStreamBasicDescription. Once the flag was set Speech became well behaved but still didn't recognize the audio. This was puzzling as there were no errors raised to indicate what I was doing wrong. In the end, the iOS 11 betas gave me some clues:

2017-06-22 21:04:41.482752-0700 RTCRoomsDemo[711:83096] [] CMSampleBufferCopyPCMDataIntoAudioBufferList signalled err=-12731 (kFigSampleBufferError_RequiredParameterMissing) (mNumberChannels incorrect) at /BuildRoot/Library/Caches/com.apple.xbs/Sources/EmbeddedCoreMediaFramework/EmbeddedCoreMedia-2010.7.3/Sources/Core/FigSampleBuffer/FigSampleBuffer.c line 4595
0   CoreMedia                           0x000000018a04b6f0 CMSampleBufferCopyPCMDataIntoAudioBufferList + 592
1   Speech                              0x000000018ffad4e0 <redacted> + 440
2   libdispatch.dylib                   0x0000000104e79654 _dispatch_call_block_and_release + 24
3   libdispatch.dylib                   0x0000000104e79614 _dispatch_client_callout + 16
4   libdispatch.dylib                   0x0000000104e89008 _dispatch_queue_serial_drain + 716
5   libdispatch.dylib                   0x0000000104e7ce58 _dispatch_queue_invoke + 340
6   libdispatch.dylib                   0x0000000104e8a1c4 _dispatch_root_queue_drain_deferred_wlh + 412
7   libdispatch.dylib                   0x0000000104e917fc _dispatch_workloop_worker_thread + 868
8   libsystem_pthread.dylib             0x00000001acc9f1e8 _pthread_wqthread + 924
9   libsystem_pthread.dylib             0x00000001acc9ee40 start_wqthread + 4

Aha! So, internally CMSampleBuffers are converted into AVAudioPCMBuffers. Audio from the mixer is delivered in stereo, and despite claims to the contrary AVAudioPCMBuffer doesn't understand stereo very well. In order to solve this problem I down-mixed the content from stereo to mono before feeding it into the request. This worked like a charm and Speech.framework recognized my audio for the first time!

Audio Timestamps

Each CMSampleBuffer needs to be timestamped and these timestamps should increase in order to work properly with APIs like AVAssetWriter. Since WebRTC's mixer raises samples at presentation time, I ended up using the host clock to timestamp the buffers.

Local Audio

WebRTC's AudioDeviceBuffer implementation does not raise samples from local recorded audio at the moment. This is very unfortunate, and something that we expected to work based upon the public APIs. While we won't support TVIAudioSink on TVILocalAudioTrack in our first release we expect to add this functionality in the future.

Conclusion

With 1.2.0 we now have a solid foundation to build upon. Despite the delays, we thank you for your interest in this feature and look forward to hearing your feedback. Barring any surprising findings we intend to release 1.2.0 on July 5th, following Canada Day and Independence Day. Future releases will add support for TVILocalAudioTrack, and eliminate allocations on the real-time audio thread.

Best wishes, Chris Eagleston

theprojectabot commented 7 years ago

Thx Chris. I’ll check my Inbox

On Sat, Jul 1, 2017 at 1:26 PM Christopher Eagleston < notifications@github.com> wrote:

Hi @lilhinx https://github.com/lilhinx, @theprojectabot https://github.com/theprojectabot, and @williamcaruso https://github.com/williamcaruso,

Thank you for your interest in this feature, and discussion on the API proposal. At long last, I have a release candidate (RC) to share with you. Changelog 1.2.0

Features

Added the ability to access raw audio samples using the TVIAudioSink protocol. At the moment this is only supported on remote TVIAudioTrack instances. #104 https://github.com/twilio/video-quickstart-swift/issues/104

Known issues

Audio playout and recording does not work correctly on iOS simulators. #73 https://github.com/twilio/video-quickstart-swift/issues/73

Participant disconnect event can take up to 120 seconds to occur. #99 https://github.com/twilio/video-quickstart-swift/issues/99

AVPlayer audio content does not mix properly with Room audio. #62 https://github.com/twilio/video-quickstart-objc/issues/62

Using the Release Candidate

Please check your inboxes for a link to the RC. I've attached an example Speech Recognizer, and Audio Recorder to this ticket. These classes demonstrate basic usage of the TVIAudioSink APIs.

TVIAudioSink Examples.zip https://github.com/twilio/video-quickstart-swift/files/1117394/TVIAudioSink.Examples.zip Discussion

TL;DR - Low level audio is difficult, Speech.framework is immature and WebRTC doesn't raise recorded audio yet.

You might be wondering why this feature took so long to deliver, so let me explain some of the challenges that we faced. Real-time audio interaction

What I didn't realize initially is that the audio samples are delivered by WebRTC on a real-time Core Audio capture / playback thread. Every time the audio thread pulls new playback samples from the mixer we get access to them. Unfortunately, operating on a real-time thread presents several constraints since memory should not be allocated nor locks taken.

In the end I did my best to reduce but not entirely eliminate allocations caused by adding a TVIAudioSink. We have cut another ticket to pre-allocate the audio CMBlockBuffers as this is the main area where we are getting hit in 1.2.0.

In order to raise the TVIAudioSink callbacks to the developer, we dispatch them to a serial queue. This means that developers don't have to worry about the constraints of a real-time audio thread. Supporting Speech.Framework

A common use case which has been requested is speech recognition via Speech.framework. Again, this was something we hoped would just work out of the box, but it was not so.

After getting this feature up and running, I tried to integrate with SFSpeechAudioBufferRecognitionRequest. Initially, Speech simply retained the buffers forever and used massive amounts of CPU.

[image: 0e77fd90-52ce-11e7-80e4-b865322d82a5] https://user-images.githubusercontent.com/1302577/27765019-84105752-5e5c-11e7-80dd-56c989232f57.png

In the end I tracked this down to a missing format flag in the AudioStreamBasicDescription. Once the flag was set Speech became well behaved but still didn't recognize the audio. This was puzzling as there were no errors raised to indicate what I was doing wrong. In the end, the iOS 11 betas gave me some clues:

2017-06-22 21:04:41.482752-0700 RTCRoomsDemo[711:83096] [] CMSampleBufferCopyPCMDataIntoAudioBufferList signalled err=-12731 (kFigSampleBufferError_RequiredParameterMissing) (mNumberChannels incorrect) at /BuildRoot/Library/Caches/com.apple.xbs/Sources/EmbeddedCoreMediaFramework/EmbeddedCoreMedia-2010.7.3/Sources/Core/FigSampleBuffer/FigSampleBuffer.c line 4595 0 CoreMedia 0x000000018a04b6f0 CMSampleBufferCopyPCMDataIntoAudioBufferList + 592 1 Speech 0x000000018ffad4e0 + 440 2 libdispatch.dylib 0x0000000104e79654 _dispatch_call_block_and_release + 24 3 libdispatch.dylib 0x0000000104e79614 _dispatch_client_callout + 16 4 libdispatch.dylib 0x0000000104e89008 _dispatch_queue_serial_drain + 716 5 libdispatch.dylib 0x0000000104e7ce58 _dispatch_queue_invoke + 340 6 libdispatch.dylib 0x0000000104e8a1c4 _dispatch_root_queue_drain_deferred_wlh + 412 7 libdispatch.dylib 0x0000000104e917fc _dispatch_workloop_worker_thread + 868 8 libsystem_pthread.dylib 0x00000001acc9f1e8 _pthread_wqthread + 924 9 libsystem_pthread.dylib 0x00000001acc9ee40 start_wqthread + 4

Aha! So, internally CMSampleBuffers are converted into AVAudioPCMBuffers. Audio from the mixer is delivered in stereo, and despite claims to the contrary AVAudioPCMBuffer doesn't understand stereo very well. In order to solve this problem I down-mixed the content from stereo to mono before feeding it into the request. This worked like a charm, and Speech.framework recognized my audio for the first time! Audio Timestamps

Each CMSampleBuffer needs to be timestamped, and these timestamps should increase in order to work properly with APIs like AVAssetWriter. Since WebRTC's mixer raises samples at presentation time, I ended up using the host clock to timestamp the buffers. Local Audio

At this time, WebRTC's AudioDeviceBuffer implementation does not raise samples from local recorded audio. This is very unfortunate, and something that we expected to work based upon the public APIs. While we won't support TVIAudioSink on TVILocalAudioTrack in our first release we expect to add this functionality in the future. Conclusion

With the release of 1.2.0 we have a solid foundation to build upon. Despite the delays, we thank you for your interest in this feature and look forward to hearing your feedback. Barring any surprising findings we intend to release 1.2.0 on July 5th, following Independence Day and Canada Day.

Best wishes, Chris Eagleston

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/twilio/video-quickstart-swift/issues/104#issuecomment-312453938, or mute the thread https://github.com/notifications/unsubscribe-auth/AAPHSL59f2eYZAIbo4aVzQSiRozS0WZ4ks5sJquSgaJpZM4M2Sir .

ceaglest commented 7 years ago

Quick update. We are bundling a fix for iOS simulator audio (https://github.com/twilio/video-quickstart-swift/issues/73) with the 1.2.0 release, but the validation took a little longer than expected. We are now planning on releasing tomorrow (July 6th) instead of today as was originally scheduled.

ceaglest commented 7 years ago

We are now live with the 1.2.0 release which introduces TVIAudioSink. I'll keep this issue open until we've had a chance to add support for TVILocalAudioTrack.

Maltby commented 6 years ago

Hello, I am having trouble understanding how the pieces fit together. My goal is to use Twilio's realtime video chat and Apple's speech API simultaneously.

I have opened the "TVIAudioSink Examples" and yet am unable to wrap my head around what the TVIAudioSink does and how exactly to use it.

My app is based off of Apple's SpeakToMe example app and the Speech-gRPC-Streaming Swift quick-start. After connecting to the video chat room, I'm attempting to cast the TVILocalAudioTrack as a TVIAudioTrack object. Then, using the ExampleSpeechRecognizer class from the "TVIAudioSink Examples", I call ExampleSpeechRecognizer.init(audioTrack: tviAudioTrack). From here I am lost as to how to obtain the necessary CMSampleBuffer for rendering the sample as well as how to manage / what to do with the AVAudioSink that is added the the tviAudioTrack.

Some sample code showing how TVIAudioSink works with the Apple Speech API would be great! Swift code would be gold!

Thank you for adding this feature @ceaglest, it was perfect timing!

initWithAudioTrack code for clarification:

- (instancetype)initWithAudioTrack:(TVIAudioTrack *)audioTrack {
    self = [super init];

    if (self != nil) {
        _speechRecognizer = [[SFSpeechRecognizer alloc] init];
        _speechRecognizer.defaultTaskHint = SFSpeechRecognitionTaskHintDictation;

        _speechRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];
        _speechRequest.shouldReportPartialResults = YES;

        __weak typeof(self) weakSelf = self;
        _speechTask = [_speechRecognizer recognitionTaskWithRequest:_speechRequest resultHandler:^(SFSpeechRecognitionResult * _Nullable result, NSError * _Nullable error) {
            __strong typeof(self) strongSelf = weakSelf;
            if (result) {
                strongSelf.speechResult = result.bestTranscription.formattedString;
                NSLog(@"Results: %@", strongSelf.speechResult);
            } else {
                NSLog(@"Speech recognition error: %@", error);
            }
        }];

        _audioTrack = audioTrack;
        [_audioTrack addSink:self];
        _trackId = _audioTrack.trackId;
    }

    return self;
}

ceaglest commented 6 years ago

Hey @Maltby,

Unfortunately, the TVIAudioSink APIs only work with TVIAudioTrack instances from remote Participants at the moment, so you won't get any samples raised if you use it with a TVILocalAudioTrack. Here is an example from one of our internal Objective-C apps which demonstrates how to recognize speech from a remote audio track.

- (void)participant:(TVIParticipant *)participant addedAudioTrack:(TVIAudioTrack *)audioTrack {
    NSLog(@"Participant %@ added audio track %@", [participant identity], audioTrack);

    // The recognizer will log speech to the console if it detects anything.
    self.speechRecognizer = [[ExampleSpeechRecognizer alloc] initWithAudioTrack:audioTrack];
}

We will add this as a known issue in our 1.2.1 changelog, but it was mentioned in the original 1.2.0 release which added the feature, and also in the discussion above.

Hopefully I can help out if you have any more questions tomorrow. Once we fix the issue with TVILocalAudioTrack we will be adding TVIAudioSink sample code to our QuickStart repos.

Thanks, Chris

Maltby commented 6 years ago

Chris you're the man!

I was definitely overthinking things, the sink is much more straightforward to use than I thought it was. I think a thorough explanation of how the pieces work together behind the scenes along with the sample code could really help some less experienced coders like myself.

Thank you sir!

ceaglest commented 6 years ago

I'm glad I could help, and I definitely agree that we should add proper sample code for this feature. For now, I've listed the known issue in our 1.2.1 changelog.

Maltby commented 6 years ago

Any updates on TVIAudioSinks working with TVILocalAudioTracks? Thank you

ceaglest commented 6 years ago

Hi @Maltby,

We have been focusing on H.264, and improved support for Group Rooms lately and haven't had a chance to address using TVIAudioSink with TVILocalAudioTrack. We hope to return to this work soon, but I don't have a firm date to share just yet.

Best, Chris Eagleston

Gabriel-Lewis commented 6 years ago

+1 on the TVIAudioSinks working with TVILocalAudioTracks 🙏 @ceaglest

MarcAMartin commented 6 years ago

+1 on the TVIAudioSinks working with TVILocalAudioTracks 🙏🏾 @ceaglest

danielsinger commented 6 years ago

+1 on the TVIAudioSinks working with TVILocalAudioTracks 🙏 @ceaglest

danielsinger commented 6 years ago

How we're gonna feel when we get TVILocalAudioTracks bitmoji

ceaglest commented 6 years ago

Thank you all. Your timing is great, because we have sprint planning tomorrow morning. I will discuss the status of this feature with the team.

danielsinger commented 6 years ago

Hey @ceaglest, Just wanted to check if you have anything to share re the status of TVILocalAudioTrack support.

This one is a blocker for us from shipping our twilio app :(

Thanks!

ceaglest commented 6 years ago

Hey @danielsinger,

I am currently researching how we might add this functionality for TVILocalAudioTrack. The biggest concern on my end is adding this capability without impacting realtime audio performance when no sinks are added to the Track.

If the results of my spike are successful then implementation can begin as early as next week. I will keep you in the loop as we get closer on this.

Best, Chris Eagleston

danielsinger commented 6 years ago

Excellent! Thanks @ceaglest

danielsinger commented 6 years ago

Hey @ceaglest Any update on this? Would love to chat about our project and what we're looking to do with twilio if you have some time this week.

Thanks!

ceaglest commented 6 years ago

Hi @danielsinger,

I am definitely interested to hear more. I left my email earlier in this ticket, give me a shout and we can setup a meeting.

Best, Chris

paynerc commented 6 years ago

We are now live with the 1.3.7 which adds TVIAudioSink support for TVILocalAudioTrack.

ceaglest commented 6 years ago

Hello,

The TVIAudioSink changes have also landed in 2.0.0-preview5. We have two remaining issues before closing out this feature:

TVIAudioSink should pre-allocate buffers used on a real-time CoreAudio thread.
Add sample code which demonstrates TVIAudioSink.

Regards, Chris Eagleston

Gabriel-Lewis commented 6 years ago

Thank you @ceaglest

ceaglest commented 6 years ago

Hello developers,

We've just released 1.3.8 which resolves the issue of memory allocations and Objective-C runtime usage on the real-time CoreAudio thread. This change will also be available in our upcoming 2.0.0-preview7 release.

Previously TVIAudioSink had unbounded memory usage (scary!) and allocated memory whenever new audio samples became available. Now a fixed size memory pool is pre-allocated at the time when a TVIAudioSink is added to a TVIAudioTrack.

Up next we're working on sample code which demonstrates these APIs by both recording audio, and recognizing speech using Apple's Speech.framework. The example should be out in the next day or two, but if you want to follow along the work is on my fork here.

Best, Chris Eagleston

ceaglest commented 6 years ago

The example code is now live!

https://github.com/twilio/video-quickstart-swift/tree/2.0.0-preview/AudioSinkExample

We will close out this issue once 2.0.0-preview7 is released.

twilio / video-quickstart-ios