xamarin / xamarin-macios

.NET for iOS, Mac Catalyst, macOS, and tvOS provide open-source bindings of the Apple SDKs for use with .NET managed languages such as C#
Other
2.48k stars 514 forks source link

[net6/Xamarin.iOS] `NSObject.ConformsToProtocol` is very costly and invoked very often #14065

Closed jeromelaban closed 2 years ago

jeromelaban commented 2 years ago

Steps to Reproduce

No specific step to reproduce, but it seems that UIView.AddSubview / UIView.RemoveFromSuperview with custom types inheriting from UIView seems to trigger this from time to time.

Expected Behavior

ConformsToProtocol does not appear prominently in performance traces.

Actual Behavior

image

Most of the time is spent here: https://github.com/xamarin/xamarin-macios/blob/d78aa6712b15659b0fed259952364c060ffedcea/src/Foundation/NSObject2.cs#L407

A lot of the time is spent invoking this method at regular times, seemingly in batch, but there's no direct stack trace leading to invocation from managed code.

Environment

VS 2022 17.1 Preview 5

Build Logs

Example Project (If Possible)

I do not have an available project yet.

spouliot commented 2 years ago

I do not think this would be a specific issue to net6 - but I might have missed some changes so take this with a grain of salt.

but there's no direct stack trace leading to invocation from managed code.

That's Apple's ObjC code calling conformsToProtocol: [1] and since it happens when add/removing views there won't be an interesting managed stack trace.

[1] so there's

Caching would help some cases but it would help in every cases (at least versus its cost). I doubt it's a good thing to implement caching inside the platform assemblies.

Still you could cache the specific cases that affects you.

Or do your own hardcoded logic to return true (or false) without running the heavy weight reflection (and cache that or not).

jeromelaban commented 2 years ago

Thanks for the detailed update! Indeed, I do know which types are queried there, and caching should be simple enough for sure.

I wonder, is this a lookup that could be statically generated, or is the pointer passed opaque enough to be useless for caching?

spouliot commented 2 years ago

I wonder, is this a lookup that could be statically generated

It is, in some cases. However if your app/config is allowing dynamic registration [1] then the querying happens (even if a single GetCustomAttributes will be called anyway).

See src/Foundation/NSObject2.cs around line 477 for details.

[1] and that could be what differ from legacy and net6 profiles in your case.

rolfbjarne commented 2 years ago

This is certainly an area where we can optimize the code, but we'd probably need a real-world sample of where it's a problem before spending time trying to fix a particular piece of code that may or may not be the actual problem.

jeromelaban commented 2 years ago

It's actually very curious, and likely something that started happening with iOS 15 (seemingly in relation to _isTransparentFocusItem). Also, caching ConformsToProtocol only marginally improves the overall performance.

Here's a snippet from what I see in instruments:

image

This happens with very large UIView trees.

spouliot commented 2 years ago

It might not be the exact same case but inside Dope for net6.0-macos the time spent inside ConformsToProtocol (and reflection) happens when removing subviews [1]. That's 2.82 seconds out of a 2m05s benchmark run (so more than 2%).

Screen Shot 2022-06-05 at 2 59 16 PM

The static registrar is not enabled by default for net6.0-macos (which it should be for net6.0-ios applications) and that should make a cache helpful... but there might be a different way (for macOS) to handle this. I can file an issue on Uno for it - but it would best to understand how this affects iOS (with the static registrar) first, so that a fix is beneficial to both platforms.

[1] that subviews removal is somewhat (maybe not everywhere yet) optimized for macOS (using an API not available on iOS) so it could be worse on iOS (at least when the static registrar was not enabled).

spouliot commented 2 years ago

With a cache around ConformsToProtocol this call becomes fast enough not to be part of the trace data.

Here's a newer screenshot of the previous data, this time focusing on RemoveFromSubviews so it's possible to compare with newer data.

Screen Shot 2022-06-09 at 10 02 23 PM

ConformsToProtocol is 3.13 seconds out of 7.85 seconds for RemoveFromSubviews

With the cache enabled...

Screen Shot 2022-06-09 at 10 05 34 PM

well it's 0s (no data) out of 4.82 seconds for RemoveFromSubviews. Note: horizontal scale is automatic, don't trust your eyes 🙄

Now the situation for macOS (numbers above) is a bit different than iOS since removing the dynamic registrar is not very common (beside not being documented) and my attempt for net6 crashed (still need to look why 👀). This makes calls to the reflection parts of ConformsToProtocol much more common.

@Therzok does this slowdown also shows up inside VSfM ?

Benchmark

I made a benchmark to compare RemoveFromSuperview with the uses of a NSMutableArray to show the extra cost.

BenchmarkDotNet=v0.12.1.1528-nightly, OS=macOS 12.4 (21F79) [Darwin 21.5.0]
Apple M1, 1 CPU, 8 logical and 8 physical cores
  [Host] : Mono 6.12.0 (2020-02/dffa5ab9224), Arm64

Job=InProcess  Toolchain=InProcessEmitToolchain  IterationCount=3
LaunchCount=1  WarmupCount=3
Method Length Mean Error StdDev
NSView_AddSubview_RemoveFromSuperview 1 42.65 us 83.92 us 4.600 us
NSMutableArray_Add_Remove 1 35.96 us 34.83 us 1.909 us
NSView_AddSubview_RemoveFromSuperview 16 588.29 us 285.21 us 15.633 us
NSMutableArray_Add_Remove 16 608.13 us 233.19 us 12.782 us
NSView_AddSubview_RemoveFromSuperview 256 11,683.28 us 4,435.93 us 243.148 us
NSMutableArray_Add_Remove 256 12,012.51 us 2,989.70 us 163.876 us
NSView_AddSubview_RemoveFromSuperview 4096 336,094.00 us 94,665.01 us 5,188.908 us
NSMutableArray_Add_Remove 4096 213,205.46 us 51,970.81 us 2,848.695 us

However it's not clear it's being asked about protocols (any or as often) as a real, displayed view would 🧂. I have not adapted the cache to NSObject2.cs (yet) to see the impact.

Benchmark's Source Code

Numbers

Xamarin.Mac Legacy

macOS 12.4 / Legacy / arm64 Build Change Reuse Grid
Xamarin.Mac Legacy (454e16a) 1574.60 440982.29 1950.91 6602.72
Xamarin.Mac Legacy (CtP cache) 1773.51 520282.21 1909.14 6619.84
Impact 1.13x 1.18x 0.98 [1] 1.00x

[1] that's within the noise of the test, sporadic GC pauses

net6.0-macos

macOS 12.4 / net6.0-macos / arm64 Build Change
net-6.0 (w/pinvoke wrappers) 1467.18 385994.96
net-6.0 (plus CtP cache) 1522.31 510197.64
Impact 1.04x 1.32x

Conclusion

IMO that's impressive enough to add it (at least inside Uno's BindableNSView code). The jury is still out (another todo 😉) if it's worth enabling for iOS. If it is I think it's worth having this cache inside the SDK.

FWIW the cache related code could be added under a

if (Runtime.DynamicRegistrationSupported) {
}

so the linker/trimmer would remove it whenever the dynamic registrar is removed, making it a no-cost for most iOS apps.

Finally it's also worth nothing that the calls to ConformsToProtocol comes from the OS itself (and can vary by platforms and/or versions). IOW it's something that the developer has no control over (unless the developer controls the subclass that receive that selector and can override the method).

rolfbjarne commented 2 years ago

One point to have in mind is that it's possible to add protocols to an existing class at runtime... so we might only be able to cache parts of the ConformsToProtocol logic (the part where we use reflection to check if a managed class implements a certain interface / has a certain attribute).

spouliot commented 2 years ago

Good point, there's a class_addProtocol API, but AFAIK there's no matching removal API.

There's two parts that does reflection inside ConformsToProtocol

So if we can cache only the reflection part we still win back 2/3 of the time.

Or, since the protocols conformance can't be removed, use the cache and fall back to native/selectors when it returns false. That would cache the true that came from the selector call.

Therzok commented 2 years ago

@spouliot I remember the protocol checks pop up on mono/xammac, but not that often in net6 - but when using dotnet trace (on release builds) I don't see it. It's possible it's inlined?

I guess it's the frame right under RemoveFromSuperview - this is a short trace taken after playing around in the IDE for a bit,

image
Therzok commented 2 years ago

Is the managed RemoveFromSuperview method called for direct bindings as well?

spouliot commented 2 years ago

@Therzok thanks for looking at it 😄

It's possible it's inlined?

No, it could be removed (by the linker/trimmer) if the dynamic registrar is removed. I thought it was not for VSfM but I'm not 💯% sure about it...

Also it's possible that ConformsToProtocol might not be called from RemoveFromSuperview. It's not clear under which circumstances it is (or not) called and, even if we knew, it could change between OS versions.

Is the managed RemoveFromSuperview method called for direct bindings as well?

Yes, unless the linker/trimmer was able to remove it: like no override and only called from native code. Since we see it inside the trace then it will be called, even for direct bindings.

Therzok commented 2 years ago

I need to look into why I'm missing so many symbols, these stacks should all have had symbols in them.