Closed andyboyd closed 5 years ago
Hi @andyboyd,
Thanks for writing in.
I've used ReplayKit2 to capture video from within an application, but not in a Broadcast Upload Extension. I had tried earlier versions of ReplayKit in iOS 10.x in an extension and didn't see the almost immediate memory warning and crash that you experienced. However, without published sample code or automated tests of this use case, I can't claim with certainty that we support it.
Do you have any ideas or advice?
This may be a silly question, but is there any chance you are leaking either the entire CMSampleBuffer or CVImageBuffer? TVIVideoFrame will CFRetain
the input CVImageBuffer, and CFRelease
it in its destructor. If you're able to share some example code, I'd be happy to take a look.
One more question, what iOS device, OS version, and SDK version are you using?
We are interested in supporting this use case. At the moment our team is wrapping up the 2.0.0 release, but once its complete we will investigate this further. Ideally, we would have published ReplayKit sample code which demonstrates a working integration with our Video SDK both in an extension and in an application.
I'll keep this ticket updated as we investigate further.
Best, Chris
I've done a bit more investigation on this now. It looks very much like it's the memory usage of the capture consumer (or something beneath it) that's causing the extension to crash. The scenario that was crashing immediately for me was by just passing the raw CVImageBuffer contained in the CMSampleBuffer to the capture consumer, which is quite a high resolution, so I tried ignoring that and passing a preallocated static image on every frame instead. That allowed me to play around with the resolution more easily.
I found that I was able to get a stable feed up and running by doing the following:
Both of which make complete sense, however a 300x200 image is not exactly going to deliver a great experience for my users.
At the moment I'm working on using VideoToolbox to scale my CVImageBuffers down to a lower resolution to see if I can get it stable that way before working on trying to optimise things as much as possible to try and get the quality up.
I'm not sure if it's helpful to you, but some other things I tried were dropping down the frame rate I was feeding to the capture consumer by only sending a frame to consume every few seconds, and I found that resolution was way more important than frame rate. Even when I only sent a frame every 10 seconds at a resolution of 640x480, it would crash on the second frame repeatably. Given that it was able to cope with 30 fps of 300x200, it seems unlikely that 2 frames of 640x480 uses more memory than it can handle. Is it possible that on receiving the second frame the capture consumer preallocates enough memory for a larger number of frames to optimise the speed of its memory allocations or something?
Oh, and I'm using an iPhone X running iOS 11.3, with the iOS 11.3 SDK and Twilio 2.0.0-preview9. I can't give you my actual app code, but if it's helpful to you, I can put a small sample together to demonstrate the issue.
Hi @andyboyd,
Thank you for the information, this is really helpful.
At the moment I'm working on using VideoToolbox to scale my CVImageBuffers down to a lower resolution to see if I can get it stable that way before working on trying to optimise things as much as possible to try and get the quality up.
I'm not sure if it's helpful to you, but some other things I tried were dropping down the frame rate I was feeding to the capture consumer by only sending a frame to consume every few seconds, and I found that resolution was way more important than frame rate.
There is a betterment to save memory in our video pipeline. If you choose H.264 then we will use VideoToolbox internally to encode the video, in some cases scaling it down as available bandwidth demands in our VTCompressionSession. The problem is that the video pipeline needs to support software codecs like VP8 as well, and we maintain an I420 buffer pool for this purpose. When a frame is captured we don't know if hardware or software codecs will be needed so we pull a buffer from the pool and package it along with the captured frame.
If you're never using VP8 (and you don't have renderers which require an I420 conversion), then this buffer is adding a memory overhead of:
Size = Width * Height * 1.5
At least when no downscaling is requested by the encoder(s). On an iPhone X, where the screen is 2436x1125 this adds up to:
Size = 4,110,750 bytes/frame
Typically there are only 1-2 of these buffers in flight at a time, but this cost is pretty important if you are in an extension. I should also point out that our H.264 codec is currently limited to no more than 1280x720 anyways, so passing such large frames will actually cause the encode to fail.
Oh, and I'm using an iPhone X running iOS 11.3, with the iOS 11.3 SDK and Twilio 2.0.0-preview9. I can't give you my actual app code, but if it's helpful to you, I can put a small sample together to demonstrate the issue.
I would recommend sticking with the latest 2.0.0-beta4 if possible, but I don't expect that you will see any memory usage reductions by making the change. As I mentioned earlier, we won't be able to look into this issue further until 2.0.0 is out, but I may ask for more information (like sample code) at that time.
Best, Chris
I've managed to get something up and running. I ended up resizing the CVImageBuffers from the CMSampleBuffers down to 1/4 (i.e. half width and half height) their original size, and it seems to be working pretty well. I do think it's just a case of tuning the input resolution of the frames being sent to Twilio to control the memory usage.
I guess my issue is pretty much resolved now, so thanks for you help.
If you're using VideoToolbox internally, I do wonder if it would be possible to simply provide CMSampleBuffers to the capture consumer, along with a target resolution, and letting it subsample them in the compression session, rather than having to convert them to a smaller input size, and then the compression session compressing them again. It seems like that would be more efficient, but I suspect it depends on the internals of VTCompressionSession and how it has to allocate memory while it's working. It seems like that interface would play nicer with broadcast extensions than what's currently available.
Hi,
If you're using VideoToolbox internally, I do wonder if it would be possible to simply provide CMSampleBuffers to the capture consumer, along with a target resolution, and letting it subsample them in the compression session, rather than having to convert them to a smaller input size, and then the compression session compressing them again. It seems like that would be more efficient, but I suspect it depends on the internals of VTCompressionSession and how it has to allocate memory while it's working. It seems like that interface would play nicer with broadcast extensions than what's currently available.
Yes, we are considering letting the developer provide cropping and/or scaling information to TVIVideoCaptureConsumer
. This would allow us to allocate smaller buffers when using the software (VP8, VP9) pipeline, and skip the extra step of your capturer having to perform scaling up front.
When using VTCompressionSession our goal is to feed frames directly into it if possible. However, there are some case (like where pixel format conversions or rotations are required) where we have the session allocate an input buffer pool and copy your frames into it.
I'm very glad that you've got something up and running. I'll keep this ticket open until we have time to revisit this use case post-2.0.
Best, Chris
Hi again Chris,
A bit of a follow up question for you.
I'm trying to add Audio to my stream, and I've been able to publish a local audio track successfully, but it's not receiving any content. I don't think the extension has a default audio session the way the audio track is expecting, because it gives me the raw audio samples in the processSampleBuffer callback.
I get the feeling the way to do this in the extension is to implement a TVIAudioSink, and then give that the CMSampleBuffers from my processSampleBuffer callbacks, but I'm not quite clear on what I should be doing inside the renderSample function of the audio sink.
Am I on the right track with that at all, or am I misunderstanding the way it's supposed to work?
I've made some progress, but still in need of a bit of help.
So far, I've created my own TVIAudioDevice, I'm calling TVIAudioDeviceFormatChanged() in startCapturing(), and keeping a reference to the context. Then, when I receive audio samples from ReplayKit, I'm calling TVIAudioDeviceWriteCaptureData(). I am successfully publishing an audio track, and audio is coming through on it, but it's all garbled and corrupted. Just sounds like a horrible crackling sound, though I can hear that it's definitely responding to the noises I make into the microphone, because the crackling changes, and sounds like R2D2 is in the background.
My theory is that somehow in converting from CMSampleBuffers to the UnsafeMutablePointer
Hey @andyboyd,
Sorry for the late response, I was out of office yesterday.
You are on the right track by creating your own TVIAudioDevice. Unfortunately, this is a case where we don't have sample code specific to ReplayKit yet. Have you had a look at AudioDeviceExample?
My theory is that somehow in converting from CMSampleBuffers to the UnsafeMutablePointer that TVIAudioDeviceWriteCaptureData() requires, something is getting warped somehow. Possibly the timings are out or something, but I can't really find any documentation anywhere about what I need to do to resolve this. It's just trial and error.
It's a matter of making sure the TVIAudioFormat that you are using matches that of the incoming CMSampleBuffer's AudioStreamBasicDescription in terms of the number of channels and sample rate. You could do a sanity check by comparing the two ASBDs, but in general this is where you want to derive the TVIAudioFormat from.
Access AudioStreamBasicDescription from CMSampleBuffer: https://github.com/twilio/video-quickstart-swift/blob/master/AudioSinkExample/AudioSinks/ExampleSpeechRecognizer.m#L86
Create AudioStreamBasicDescription from TVIAudioFormat: https://twilio.github.io/twilio-video-ios/docs/latest/Classes/TVIAudioFormat.html#//api/name/streamDescription
Best, Chris
Thanks Chris,
Coincidentally, I did get the mic audio streaming successfully about half an hour before you replied! Isn't that always the way!
The key in the end was related to the buffer size being sent to TVIAudioDeviceWriteCaptureData(). I had been sending through the buffer size from the audio device's capture format every time, but I needed to send through the smaller value of that and mDataByteSize of the CMBlockBuffer contained within the sample buffer.
Awesome, it's great to hear that you've got an AudioDevice up and running with ReplayKit.
Hi again @ceaglest
I have another question on this issue, not sure if it's something I'm misunderstanding, or if it's a limitation of the way the Twilio SDK currently works.
The mechanism for getting the captureContext/renderContext on the TVIAudioDevice is to call TVIAudioDeviceCaptureFormatChanged()/TVIAudioDeviceRenderFormatChanged(), which triggers a query to the audio device's captureFormat or renderFormat property.
This works just fine, but in a broadcast extension, the extension is continually getting samples from both the microphone and the running app. These are usually in different formats. It's quite problematic calling TVIAudioDeviceCaptureFormatChanged() every time a different type of sample comes in, since doing so usually results in a dropped sample while the audio device reinitialises with the new format. In the broadcast extension, since the samples get interleaved, this usually results in one or both of the sources getting skipped entirely, because the samples are dropped while the format changes.
I imagine this is probably a limitation of the TwilioVideo SDK having a single pipeline each for rendering and capturing, but is there anything you can think of that would help with this situation? If not, I guess it's another feature request!
Thanks.
PS, I'm using SDK version 2.2.1 on iOS 11
Hi @andyboyd,
Sorry for the delayed response.
I imagine this is probably a limitation of the TwilioVideo SDK having a single pipeline each for rendering and capturing, but is there anything you can think of that would help with this situation? If not, I guess it's another feature request!
Unfortunately a TVIAudioDevice
can only work with a single capture and recording format at a time. Your capturer should deliver a continuous stream of raw audio samples, with format changes only when needed. A format change causes other elements of the audio pipeline to be reconfigured, so it shouldn't be done for every slice of audio.
In this case, I think what you want is to pick a canonical recording format (either the mic or app audio format) and convert the other input to match that. Then mix the result together in either mono or stereo, and deliver it to us from there.
You can use an AudioConverter
to perform a channel and/or sample rate conversion, like this example.
I think this is a great question for @piyushtank because he is working on a ReplayKit example:
https://github.com/twilio/video-quickstart-swift/pull/287
Lets keep the discussion going, Chris
Hi, Do you have updates on code samples for using ReplayKit with the TwilioVideo SDK? Thanks
@julien-l We are discussing prioritizing our ReplayKit sample code ticket and getting it into the next sprint. We have a work-in-progress PR available which demonstrates how to use ReplayKit with TwilioVideo - https://github.com/twilio/video-quickstart-swift/pull/287
We were side tracked by other high priority issues with our Voice and Video SDKs. Internally we are discussing to work on the TODO
items listed on the PR and get it merged in the coming sprint. I will keep you posted.
@andyboyd “The key in the end was related to the buffer size being sent to TVIAudioDeviceWriteCaptureData(). I had been sending through the buffer size from the audio device's capture format every time, but I needed to send through the smaller value of that and mDataByteSize of the CMBlockBuffer contained within the sample buffer.”
Could you possibly share some example code? Thanks you!
Hello folks,
We have released 2.6.0-preview1, with new Video Source APIs that significantly improve the performance of streaming ReplayKit content. You can try our updated example now.
Could you possibly share some example code? Thanks you!
If the original poster would be willing to share, that is great! We do plan to demonstrate mixing in the example app, but haven't had a chance to update it yet. I'll circle back with the team on where we could fit this in.
Best, Chris
@ceaglest Great news on the 2.6.0 preview!
Regarding the audio, In our use case we are seeking just the app audio and do not need to mix in the microphone audio. We attempted capturing app audio without success using the ExampleReplayKitAudioCapturer. Looking through the issues and comments, I’m not sure that anybody has successfully captured the app audio. Have you attempted to capture app audio? Thank you!
Hi @etown,
We attempted capturing app audio without success using the ExampleReplayKitAudioCapturer. Looking through the issues and comments, I’m not sure that anybody has successfully captured the app audio. Have you attempted to capture app audio? Thank you!
I have not, but judging by the ticket that was filed this morning (https://github.com/twilio/video-quickstart-swift/issues/339) it should be possible to capture just app audio, without mixing, by making some small changes.
Best, Chris
Thanks Chris,
Any hints about where the changes need to be to go from recording mic audio to app audio?
The supplied sizeInBytes is invalid. The sizeInBytes must match with the size returned by TVIAudioFormat:bufferSize utility method.
On Dec 17, 2018, at 9:11 AM, Christopher Eagleston notifications@github.com wrote:
Hi @etown,
We attempted capturing app audio without success using the ExampleReplayKitAudioCapturer. Looking through the issues and comments, I’m not sure that anybody has successfully captured the app audio. Have you attempted to capture app audio? Thank you!
I have not, but judging by the ticket that was filed this morning (twilio/video-quickstart-swift#339) it should be possible to capture just app audio, without mixing, by making some small changes.
Best, Chris
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Hi @etown,
The supplied sizeInBytes is invalid. The sizeInBytes must match with the size returned by TVIAudioFormat:bufferSize utility method.
I looked into this briefly and posted an update in: https://github.com/twilio/video-quickstart-swift/issues/339
Best, Chris
@ceaglest I wonder if there's any way to initiate the Twilio call from the main app and then during the call start sharing the screen. In the code example, when broadcasting the screen, user is connected to Twilio room from within the upload extension. What if someone connected to the room from the main app and then want to initiate screen broadcast?
I was playing around with passing PixelBuffers to the main app via shared memory and then pushing those to the Twilio, but I couldn't make it work :( (something's broken when copying buffers)
Hi Developers,
I believe all the original questions in this issue have been answered, so I'm closing it out. To summarize, our example has a broadcast extension which:
We will continue to improve on our example code in future iterations.
Best, Chris
Hi,
I'm trying to implement a Broadcast Upload Extension using Twilio and I seem to be having issues with memory usage causing the extension to crash due to memory pressure.
I've created a custom TVIVideoCapturer class based off your example, with the main difference being that since this is an upload extension, I'm being provided with sample buffers, rather than having to grab frames off a view myself.
So, my (simplified) general approach is:
The conversion from CMSampleBuffer to CVImageBuffer seems to be working as expected, but after consuming a few frames my extension receives memory warnings and crashes. I've tried tweaking the resolution and frame rate of the supported format in my capturer, but it doesn't seem to make a difference.
Do you have any ideas or advice?
Thanks