ISDK-2195: TVIScreenCapturer uses too much cpu in iOS 12

souvickcse commented 6 years ago

I am using TVIScreenCapturer for twillo video to share a custom camera view from my app let capturer: TVIScreenCapturer = TVIScreenCapturer.init(view: sharedView) But after updating to iOS 12 it is slowing down the full UI. In instrument I am getting this:

screen shot 2018-09-19 at 11 53 17 am

I am assuming the screen rendering with iOS 12 has some issue, as previously it is working fine in iOS 11. Please let me know how to fix this. I am using 'TwilioVideo', '~> 2.3.0'

souvickcse commented 6 years ago

@ceaglest Can you please help?

ceaglest commented 6 years ago

Hey @souvickcse,

I can certainly take a look at this, but there might not be that much we can do to optimize this path in iOS 12.0. We are currently working on a ReplayKit example app, since this is the preferred way to capture the screen in modern versions of iOS. Since iOS 11.0 it has been possible to capture the screen of your app using ReplayKit from within your process, and using a lot less CPU than drawing the frame every time.

My experience on iOS 11 is that modern 3x retina devices have a lot of problems drawing the contents of a full screen UIView within 1/60 second.

Best, Chris

ceaglest commented 6 years ago

For anyone who is relying on TVIScreenCapturer or using the UIView snapshotting technique, I apologize for this issue. I'll post an update here after analyzing the performance on iOS 12 locally. I have cut a bug ISDK-2195 for internal tracking.

rcodarini commented 6 years ago

Any news about this issue?

ceaglest commented 6 years ago

I'm currently measuring performance on iOS 11 and iOS 12. This issue is important and if I find any improvement that can be made, I'll post a PR to ExampleScreenCapturer in parallel to updating the SDK.

julien-l commented 6 years ago

@ceaglest if it can help: I've been hit by this problem when writing a custom screen capturer for my app to capture at 20 fps (TVIScreenCapturer was too slow for my use case). I based my implementation on the ExampleScreenCapturer project which uses Quartz and does all the rendering on CPU; I expect TVIScreenCapturer has a similar implementation. This is especially bad on iPhone X due to its huge resolution.

See details here: https://stackoverflow.com/q/52305504/143504

As a side note: I am very interested in making ReplayKit work for my app.

ceaglest commented 6 years ago

Hi,

On iOS 12 drawing the UIView is undergoing an expensive color conversion that did not occur in iOS 11. On my test iPhone X the main thread spends almost all its time just trying to handle 5 frames per second and is unresponsive to touch.

screen shot 2018-09-21 at 4 31 47 pm

I believe that it will be possible to avoid this by using the UIGraphicsImageRenderer APIs and performing a less expensive (or no) color space conversion. I'm exploring a fix using this approach now.

See details here: https://stackoverflow.com/q/52305504/143504

Thanks. I'm certainly happy to avoid memory copies but I don't think that isn't what is making things slow at the moment.

Best, Chris

ceaglest commented 6 years ago

Hi folks,

I've got another update to share. I went down the path of trying UIGraphicsImageRenderer and it did not provide the results that I was hoping for. I've tested all the permutations of UIGraphicsImageRendererFormat I could think of but so far they all share the same poor performance characteristics. I have a branch with a modified version of ExampleScreenCapturer if you want to take a look at the code.

https://github.com/ceaglest/video-quickstart-swift/tree/bug/isdk-2195-uigraphicscontext

I'm going to keep digging on this one to see if there are any other possible solutions. The next thing I'm considering trying is managing CGContext manually, where I can assign the working color space.

Best, Chris

ceaglest commented 6 years ago

I pushed some more commits which try to create a CGContext using the sRGB color space and then use that for UIView drawing. It still looks like the very expensive copy_and_colormatch_CIF10_to_P3_vImage is being called. Why this is happening, I don't fully understand, considering that both my context and the CALayer don't want P3 color. Of course, this is running on an iPhone X which is capable of displaying P3.

I can see that the WKWebView which forms the base of the view hierarchy in our ScreenCapturerExample hints that its contents are sRGB.

Start capturing. UIView.layer.contentsFormat was RGBA8

ceaglest commented 6 years ago

Hi,

I've tried a number of generic approaches using CGContext, and UIGraphics APIs. None of them work very well on iOS 12. The "fastest" one so far for me has been to create my own contexts with CGColorSpace.genericRGBLinear. This is 18-20% faster in the sample app (which is rendering a WKWebView), and has linear gamma which looks incorrect.. Take a look at the code here, which I've updated to make each rendering technique distinct.

I'm looking into a couple special cases, targeting WKWebView (snapshotting), MTKView (drawable), and CALayer (contents). If those don't pan out then our next step is going to be to deprecate TVIScreenCapturer on iOS 12.0 and above. I should re-iterate that, while ReplayKit can only capture the entire window it does so with low overhead, and CPU usage. You also get NV12 buffers which are ideally suited for video instead of BGRA buffers.

I am using TVIScreenCapturer for twillo video to share a custom camera view from my app

@souvickcse If you only want to capture from an MTKView (is that correct?), then maybe there is a faster path we could take to get at the content through the drawable. You don't necessarily need ReplayKit to just capture from an MTKView but I still think this might be the best/easiest option. This is assuming of course that you would be okay with requiring users to accept ReplayKit permissions.

Regards, Chris

souvickcse commented 6 years ago

Hi @ceaglest , In our case the main problem with the ReplayKit is: we don't want to share the whole screen with all the controls, we just want to share the camera preview. Is there any option to share the CMSampleBuffer from the camera feed(func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection)) directly with twillio? If so we don't have to use the screenshare we can directly send the camera data to twillio. Thank You.

ceaglest commented 6 years ago

Hi @souvickcse,

If your custom capturer has direct access to AVCaptureSession, then just deliver us the CVPixelBuffer contained in the CMSampleBuffer. For example (pardon the Objective-C, and some missing methods for orientation).

- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection {
    if ( connection == _videoConnection ) {
        // It's up to you who handles rotation... AVCaptureSession can do it.
        TVIVideoOrientation orientation = TVIVideoOrientationUp;
        if (_configuration.outputRotation == TVIOutputRotationModeMetadata) {
            AVCaptureVideoOrientation videoOrientation = _orientationObsever.orientation;
            orientation = TVICaptureOrientationMetadataForVideoOrientationAndCameraPosition(videoOrientation,
                                                                                            _videoDevice.position);
        }

        // We support the VDO outputs BGRA, and NV12.
        TVIVideoFrame *frame = [[TVIVideoFrame alloc] initWithTimestamp:CMSampleBufferGetPresentationTimeStamp( sampleBuffer )
                                                                 buffer:CMSampleBufferGetImageBuffer( sampleBuffer )
                                                            orientation:orientation];

        [_captureConsumer consumeCapturedFrame:frame];
    }
}

If you need them after processing in Metal, then thats a different story.

Regards, Chris

souvickcse commented 6 years ago

Hi @ceaglest , Okey cool I will try it. Thank you for your help.

ceaglest commented 6 years ago

I've updated my branch to demonstrate snapshotting of a WKWebView, which is quite fast and completely usable on iOS 12. One caveat is that the pixel format is incorrect at the moment, but this is an easy fix. Looking at some others now.

ceaglest commented 6 years ago

Hello,

I've added an example of snapshotting an MTKView. The performance is fine, but the approach of firing a CADisplayLink timer to start scraping is not the right one. Ideally, your capturer would be signaled each time the MTKView.currentDrawable is updated. This typically occurs within [MTKViewDelegate drawInMTKView:], a callback that a generic screen capturer does not have access to.

To summarize, I think we can say pretty clearly that a generic approach using UIView is just not going to work for all UIViews / iOS Devices on iOS 12.0. I've demonstrated a few custom tailored approaches on my branch, but which one applies to you depends on your use case. While it's great that these techniques are possible, we don't believe that modifying TVIScreenCapturer (and TwilioVideo.framework) to be aware of classes like WKWebView, MTKView, and SCNView makes sense. For this reason we are proceeding with deprecating TVIScreenCapturer on iOS 12.0, and will soon release version 2.5.0 with this change.

My next focus will be to complete the ReplayKit example since this is the best approach for most use cases.

What I can do is offer sample code for anyone who is looking to integrate content from particular UIView subclasses, and doesn't want to go through the ReplayKit permissions flow. My branch is a good start, but there would be more work needed to publish an official sample app update.

@julien-l Based upon your SO post, I think you would be okay with using ReplayKit for this purpose (several UI elements, and Open GL rendered content).

@rcodarini I'm not entirely clear on your use case. Could you explain what type of content you are trying to capture?

Once again, expect more updates to this issue as 2.5.0 rolls out and the ReplayKit example is merged.

Regards, Chris

julien-l commented 6 years ago

@ceaglest thanks for the your work and recommendations.

My plan is to go for ReplayKit as you said. Ideally, I would like the ReplayKit example to show how to capture/stream the camera.

Thanks again, Julien

rcodarini commented 6 years ago

@ceaglest Thanks for your work.

We use the Screen capturer to stream ARKit content

Thanks Roberto

souvickcse commented 6 years ago

Hi @ceaglest, Thank you for your work and the update. I have tried your solution(sending CVPixelBuffer in [_captureConsumer consumeCapturedFrame:frame];) with ExampleScreenCapturer.swift for our app. Though it is better than TVIScreenCapturer, still I am finding high cpu usage in our app. In AVCaptureSession I am showing the camera feed with 720p/1080p/4k preset with 30/60fps, and I am using this code let frame = TVIVideoFrame.init(timestamp: timestamp, buffer: finalVideoPixelBuffer, orientation: TVIVideoOrientation.down) if frame != nil { TwillioManager.sharedInstance.sendVideoFrame(frame: frame!) } in captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection)

We have checked the instrument for the 720p 30fps camera it is showing that it is taking good amount of cpu usage in captureOutput and some thread which is created by twillio. But if we remove the twillio part from captureOutput there is no such high cpu spikes in the instrument graph. Please find the log files here and it is happening for iOS 11 as well.

So here are some of our questions:

Is there anything we can do to make this use a low cpu profile?
with this solution we found the picture quality is same as the device camera preview, doesn't twillio scale down/compress the CVPixelBuffer depending on the bandwidth in this scenario? In our view the there should be an adaptive logic for throttled network & devices
Is there any option to scale down/compress the CVPixelBuffer from twillio side, I have tried to scale down the CVPixelBuffer to 1/3 using this. We are reducing the frame height & width to reduce the load but then the cpu show 400% in xcode for 4k video, its high enough for 720p & 1080p as well?
Suppose we make this work how will twillio support us in the future for our implementation. Not sure if this api will be depreciated

ceaglest commented 6 years ago

Hey @rcodarini,

@ceaglest Thanks for your work.

We use the Screen capturer to stream ARKit content

You are welcome, I'm just trying to find solutions that keep all of our apps working on iOS 12. If you are capturing ARKit content, I have some suggestions:

If you are rendering AR content using Metal, you could consider copying the drawable's texture into an offscreen buffer each time you update it. I have a partial demonstration of this on my branch earlier in the thread.
If you are using ARSCNView, then we have an example where you can use their snapshotting API to get at the content.

Regards, Chris

ceaglest commented 6 years ago

Hi @souvickcse,

Is there anything we can do to make this use a low cpu profile?

As a rule of thumb, don't capture in higher resolution / frame rate than you actually need.

I'll take a look at your traces, but I would compare the usage of TVICameraCapturer with a 720p constraint vs your custom capturer. Simply capturing 720p (without rendering) takes a few percent CPU in process using AVCaptureSession. Your capturer should be doing a lot less work in general then the media engine is doing to encode / stream your video.

with this solution we found the picture quality is same as the device camera preview, doesn't twillio scale down/compress the CVPixelBuffer depending on the bandwidth in this scenario? In our view the there should be an adaptive logic for throttled network & devices

We compress the CVPixelBuffers using whatever video codec you've chosen (VP8, H264, VP9), and assuming that the Room supports this codec. We do of course perform spatial and temporal downscaling based upon network conditions. You can see this by using Network Link Conditioner on either the sending or receiving end. If you attach a TVIVideoRenderer to your TVILocalVideoTrack you can also see the downscaled content.

Is there any option to scale down/compress the CVPixelBuffer from twillio side, I have tried to scale down the CVPixelBuffer to 1/3 using this. We are reducing the frame height & width to reduce the load but then the cpu show 400% in xcode for 4k video, its high enough for 720p & 1080p as well?

Don't capture in 4K and downscale in software. Capture just what you need.

I can't recommend streaming 4K video at this point. You can't encode it in hardware H.264 and the VP8/VP9 encoders are software. We need to perform a format conversion from NV12 -> I420 for the software codecs and this is quite expensive on 4K frames.

Suppose we make this work how will twillio support us in the future for our implementation. Not sure if this api will be depreciated

We are planning on replacing or augmenting TVIVideoCapturer to allow you to specify cropping/scaling up front. This is meant only for situations where you can't capture exactly what you want because your source does not support that. Passing on this information to us means we can be more efficient and use fewer buffers. If h.264 is used we can downscale in hardware.

Regards, Chris

rcodarini commented 5 years ago

Hey @ceaglest ,

If you are using ARSCNView, then we have an example where you can use their snapshotting API to get at the content.

With the snapshotting API I get the same performance issues. Did you update the ARKitExample with iOS 12?

Thanks Roberto

souvickcse commented 5 years ago

Hi @ceaglest ,

I'll take a look at your traces

Did you find anything which we can improve from our side for better performance?

As a rule of thumb, don't capture in higher resolution / frame rate than you actually need. Don't capture in 4K and downscale in software. Capture just what you need

We absolutely do need to keep 4k & 1080p capturing for our application use case, we are recording & saving the videos as well while we are streaming through twillio, so the users might want to record at 4k for best quality but the feed for twillio we need to downscale for obvious reasons before sending to tvicapture. If we do not downscale the feed, then we are having a very low framerate (frozen at times). If we downscale the frames then the devices are using too much processing power, while the newer devices like X can handle this we are having a hard time with the older devices & iPads. There might be a better approach than this. We are intending to save the video is whatever resolution clients chooses (max being 4k & min 720p) & for stream we need the stream to work & be as real-time as possible, for stream we can have that at 480p or 360p. Note we are recording, saving & streaming at the same time, it’s absolutely curtail for our app to do this at the same time. Can you please give us some advice how to downscale the frame for Twilio? We were thinking of metal filters so that we can herness the power of GPU & offload some processing from CPU. Can we use metal filter to downscale the frames & will it better than downscale in software are we are currently doning?

ceaglest commented 5 years ago

Hi @souvickcse,

There has been a lot of discussion on this thread, and I still don't have a clear sense of your use case and exactly what processing steps you are taking, using which APIs. Can we perhaps hop on a live call to discuss this? I believe you already have a support ticket open with us where we can coordinate.

If you are already using Metal in your pipeline, then yes I think it might make sense to share a pool of buffers (CVMetalTextureCache) and have the GPU perform the downscaling. This is a very expensive step on the CPU for 4k video.

Regards, Chris

ceaglest commented 5 years ago

Hi @rcodarini,

With the snapshotting API I get the same performance issues. Did you update the ARKitExample with iOS 12?

I did not update ARKitExample for iOS 12.0. However, I tried it and saw similar performance to iOS 11.x on an iPhone X / iOS 12.0. Are you experiencing something different? If so, please file a separate issue for this.

This debugger screenshot shows the impact of capturing the ARKit content from ARSCNView, while operating ARKit in iOS 12.0.

Regards, Chris

rcodarini commented 5 years ago

Thanks @ceaglest for the support. I solved the issue. The problem was related to the scene configuration: we removed the "Procedural Sky" and everything started work again

ceaglest commented 5 years ago

Hi folks,

Thanks @ceaglest for the support. I solved the issue. The problem was related to the scene configuration: we removed the "Procedural Sky" and everything started work again

@rcodarini that's great to hear.

I'm closing out this issue now that we've merged our first iteration of the ReplayKit example and completed the deprecation of TVIScreenCapturer on iOS 12.0. If you have specific questions about other capture techniques discussed in this issue (that the WIP branch did not answer) feel free to open another issue.

Thank you again for your patience with this bug.

Regards, Chris

ArkadiYoskovitz commented 5 years ago

Hi just wanted to say that i'm facing a similar issue in an app i'm writing for a client and according to apple DTS the hight CPU is due to color space conversion preformed when the snap show is drawn into the CGBitmapContext that i'm using, turns out that the iPhone x device family on iOS 12 uses some kind of a half floating pixel color format, it this gives you any additional leads please let me know

ceaglest commented 5 years ago

Hi @ArkadiYoskovitz,

I'm glad you've engaged Apple DTS, feel free to point them to this thread if you like! I'm sorry to say that I have not filed a Radar about this.

It would be great if DTS could work with the right team to remediate the issue (though, we will probably never get iOS 11 performance using the A8 backing store). Something to consider is that reduced memory usage is another form of optimization. Memory is at a premium on iOS, so I understand why the tradeoff was made just not the implementation that exists in iOS 12.

Regards, Chris

ArkadiYoskovitz commented 5 years ago

Hi @ceaglest, I'll probably add this thread to my DTS event on the next iteration. But meanwhile a question, do you know how i can retrieve the display color space on iOS 12?

The usual method on using a uigraphicsgetcurrentcontext doesn't work on iOS 12 due to a change in the way the OS is implemented (it returns a CGContext of type unknown, which doesn't hold the relevant info)

i had an idea to preform the screen capture into a temporary CGContext, in the screen display color space format, and then convert it into my video format on a background thread, but i cant seem to get the display color space format from the device correctly, maybe have some advice?

jvmontes commented 5 years ago

Hi @ceaglest,

I've run into similar performance issues as discussed in this thread. I know Twilio has offered a solution using ReplayKit, and I believe the use case in which we were using the TVIScreenCapturer may be different than discussed here.

We are creating a separate window, which is not the app's key window, where we create the UI for a participant experience. Do you know of any solutions or tricks to get this custom window to be used as the content to stream to our Twilio participants? Thanks so much for your support on this issue.

twilio / video-quickstart-ios

ISDK-2195: TVIScreenCapturer uses too much cpu in iOS 12 #303