microsoft / CsWinRT

C# language projection for the Windows Runtime
MIT License
526 stars 102 forks source link

Performance when WinRT is called from WinUI vs UWP #1373

Open HEIC-to-JPEG-Dev opened 8 months ago

HEIC-to-JPEG-Dev commented 8 months ago

I have the same code running in UWP and WinUI, both from blank projects (debug and release).

As an example: To enumerate my iCloud photos folder with 60,000 files and getting the ItemDate, using the indexer.

UWP takes 14 seconds. WinUI takes 6 minutes.

Expectation is that WinUI should be the same as UWP. The cause may also be related to the "await" as that is 10x faster on UWP as WinUI. However, System.IO.WriteLinesAsync() is 10x slower than System.IO.WriteLines, and that's native to WinUI.

Same code on both platforms, same machine, etc.

`var ops = new QueryOptions(); ops.SetPropertyPrefetch(PropertyPrefetchOptions.BasicProperties, null); ops.IndexerOption = IndexerOption.OnlyUseIndexerAndOptimizeForIndexedProperties;

var query = iCloudPhotosFolder.CreateFileQueryWithOptions(ops);

uint index = 0; const uint stepSize = 100;

var files = await query.GetFilesAsync(index, stepSize);

while (files.Count != 0 || index < 10000) {

foreach (var file in files) {
    var basicProperties = await file.GetBasicPropertiesAsync();
    var dr =    basicProperties.ItemDate;
}

index += stepSize;
files = await query.GetFilesAsync(index, stepSize);

}`

HEIC-to-JPEG-Dev commented 7 months ago

So, after manyh more tests. CsWinRT is a magnitude of 10 time slower using the same code - It's not super fast in NativeAOT, but it is 10 times faster.

Is there any roadmap for improving performance ? or at least making calls to Windows.Storage IO in WinRT multi-thread capable (currently apps crash if that's attempted)

charlesroddie commented 6 months ago

@HEIC-to-JPEG-Dev the 10x is comparing nativeaot winui to uwp netnative, or nativeaot winui to non nativeaot winui, or nonnativeaot winui to uwp netnative?

HEIC-to-JPEG-Dev commented 6 months ago

Running the code in a UWP project runs the code the fastest. Running the code in a WinUI Desktop project runs the code 10 times slower.

HEIC-to-JPEG-Dev commented 5 months ago

This is getting worse. I just ran some code using release build.. UWP took 3 seconds to complete. WinUI desktop app took 1 minute 49 seconds to complete.

The exact same code, on the same data set. Why???? - the code, like above, enumerates an indexed location and pre-fetches properties from the indexer.

UWP is dead, the future is WinUI. but this is a massive downgrade.

charlesroddie commented 5 months ago

WinUI was launched several years prematurely before it had an AOT release mode. Maybe if you compare UWP debug mode to WinUI it would be more of an apples to apples comparison. Your timings don't make it clear what is debug and what is release.

@manodasanW and @Sergio0694 are getting NativeAOT support ready. I don't believe this is testable yet with WinUI.

Sergio0694 commented 5 months ago

It's worth noting that there might be multiple factors at play here, and it's not really possible to determine what the cause of that slowdown actually is just from that code snippet. The comparison is essentially invalid: you're comparing two test cases that differ both in execution mode (UWP vs Win32), as well as marshalling infrastructure (.NET Core 5 UWP or .NET Native with MCG, vs CsWinRT), and .NET runtime itself as well. For starters, and to eg. exclude the culprit being the storage broker or any of the related infrastructure, one would need to setup a test case between a UWP app and a Win32 app, both using CsWinRT. Of course, there's no official tooling to create such a UWP app, so that might be kinda tricky. But once you do have that, then you can check whether the performance difference is still there. If there is, then the problem is not CsWinRT, but something else (and as such the issue should also be reported somewhere else). If not, then you can setup a proper benchmark, run a profile and try to investigate where the hot path causing the bottlenecks are, assuming that it is in fact causing by the marshalling layer. Would also be worth testing with one of the preview builds of CsWinRT with the new AOT support, of course.

I guess TLDR: it's more complicated than just "this is slower in WinUI than UWP".

HEIC-to-JPEG-Dev commented 5 months ago

WinUI was launched several years prematurely before it had an AOT release mode. Maybe if you compare UWP debug mode to WinUI it would be more of an apples to apples comparison. Your timings don't make it clear what is debug and what is release.

@manodasanW and @Sergio0694 are getting NativeAOT support ready. I don't believe this is testable yet with WinUI.

Debug (UWP) performance is pretty much the same. i.e. 10x faster than WinUI debug/release.

dongle-the-gadget commented 5 months ago

Debug (UWP) performance is pretty much the same. i.e. 10x faster than WinUI debug/release.

That alone pretty much doesn't exclude any of the variables Sergio mentioned: runtime, execution contexts and marshalling layer are all different. A fairer test, as mentioned, would be to make a UWP app, using CsWinRT and benchmark it against the WinUI version. Doing that will remove the marshalling layer and runtime as variables.

manodasanW commented 5 months ago

When we looked into the performance of various storage APIs in the past, we have found that there are differences in how the calls are brokered at the OS API level between UWP and Win32 apps. That might explain some of the perf difference. I have been holding on looking into this issue because on the CsWinRT side we have a bunch of perf improvements in our AOT staging branch that would bring us more in line with UWP performance and I want to do an analysis of this with that branch once it is stabilized a bit more.

HEIC-to-JPEG-Dev commented 5 months ago

I'm looking forward to it. You might also want to have a look at .Net's "await" as that's also slower than UWP's "await" calls - I'm assuming because UWP has a faster return type. But that isn't the main cause. In my tests, over 7,000 "await" calls to File.WriteTextAsync in .Net, it was a 1.5 second overhead compared to File.WriteText - yes I know there's more going on, but it was a simple test to find out if that was part of the problem. Again, Thank you for responding

BreeceW commented 4 months ago

I have tested the code snippet above in several configurations: as a .NET Native UWP app, as a .NET 8 UWP app, as a C++/WinRT UWP app, and as a .NET 8 WinUI 3 app, .NET 8 console app, and C++/WinRT console app. As an informal benchmark, it strongly suggests that this is merely a difference in how Windows.Storage APIs behave depending on whether they are used in AppContainer, as suggested, and not an issue with C#/WinRT per se, because the UWP variants all completed the snippet with ~7,000 photos in ~13 seconds, and the Win32 variants all finished in about 33 seconds regardless of .NET runtime, lack thereof, or language projection.

HEIC-to-JPEG-Dev commented 4 months ago

How do you create a WinUI 3 .Net 8 app that isn't in an AppContainer ?

dongle-the-gadget commented 4 months ago

WinUI 3 .Net 8 app that isn't in an AppContainer

I'm confused as to what you mean here. Isn't that the default?

HEIC-to-JPEG-Dev commented 4 months ago

When you create a WinUI app it comes with a packager which is, I assume the AppContainer boundary. There is no option to create a WinUI desktop app without a packager

dongle-the-gadget commented 4 months ago

The MSIX FullTrust virtualization layer isn't AppContainer.

HEIC-to-JPEG-Dev commented 4 months ago

Just to confirm - So you're saying a packaged WinUI 3 .Net 8 doesn't use an AppContainer ? - it certainly has some "boundary" for example it redirects registry and other paths to different locations

dongle-the-gadget commented 4 months ago

So you're saying a packaged WinUI 3 .NET 8 doesn't use an AppContainer?

Yes. Filesystem and Registry virtualization isn't implemented through AppContainer.

HEIC-to-JPEG-Dev commented 4 months ago

So why does a WinUI 3 .Net 8 app run slowly on the WinRT code ? "As an informal benchmark, it strongly suggests that this is merely a difference in how Windows.Storage APIs behave depending on whether they are used in AppContainer" - means they are faster INSIDE an AppContainer.

dongle-the-gadget commented 4 months ago

So why does a WinUI 3 .Net 8 app run slowly on the WinRT code

Assume Kimbra's theory is correct, the code is slow on WinUI 3 precisely because it isn't on AppContainer. General rule of thumb: EntryPoint="Windows.FullTrustApplication" = No AppContainer.

HEIC-to-JPEG-Dev commented 4 months ago

So again, to clarify. WinUI 3 .net 8 apps will be 10x slower with the WinRT disk IO. NativeAOT won't help. So I will need to use UWP to get the performance I need

charlesroddie commented 4 months ago

So you're saying a packaged WinUI 3 .NET 8 doesn't use an AppContainer?

Yes. Filesystem and Registry virtualization isn't implemented through AppContainer.

This comment https://github.com/microsoft/WindowsAppSDK/discussions/1900#discussioncomment-1791826 from a WinUI maintainer says that "WinUI 3 apps can choose to use full trust (MediumIL) or partial trust (AppContainer).". So if @dongle-the-gadget 's theory is right then WinUI performance here should be OK just by choosing AppContainer.

HEIC-to-JPEG-Dev commented 4 months ago

Okay, that was it for the WinRT Disk IO - it's the AppContainer. WinUI 3 .Net 8 with full-trust is 10x slower than WinUI 3 .net 8 partial-trust.

However, making the WinUI app run in partial-trust immedietly gives the issues that .Net 8 DirectoryInfo & FileInfo are not allowed - meaning all other disk io becomes a pain due to having to resort to Win32 API's allowed in the AppContianer.

I'm not sure if this can be resolved, or more specifically, you can get the performance, but lose everything else.

HEIC-to-JPEG-Dev commented 3 months ago

I still find it odd that the same (WinRT) code in WinUI takes 10x longer when the AppContainer is set to (default) FullTrust. Basically, put a big continaer barrier and runtime broker around WinRT IO and it performs 10x faster than if we take the broker and container away - seems the wrong way round, but it is what it is.

Do we know if this is likley to change ?

MichalPetryka commented 3 months ago

I'm not sure if this can be resolved, or more specifically, you can get the performance, but lose everything else.

I'd try looking into doing IPC between parent full trust process and a child partial trust process that'd access the WinRT APIs as a workaround for now FYI.

MartyIX commented 2 months ago

Do I understand the discussion here correctly that .NET MAUI apps: https://github.com/dotnet/maui/blob/345bb16d806f463ad5b55f80601a929b18a3b2a6/src/Templates/src/templates/maui-mobile/Platforms/Windows/Package.appxmanifest#L43 runs slower because they are (presumably) running with Full Trust instead of in AppContainer?

I don't understand this stuff well but .NET MAUI WinUI applications feel pretty slow hence my attempt to understand it better.

HEIC-to-JPEG-Dev commented 2 months ago

I'm not sure if this can be resolved, or more specifically, you can get the performance, but lose everything else.

I'd try looking into doing IPC between parent full trust process and a child partial trust process that'd access the WinRT APIs as a workaround for now FYI.

If it's going to make my life as a developer hard then I'll just not do that project and move onto something the WinUI 3 desktop apps can do easily. The dev environment is not there to make my life hard.