microsoft / cppwinrt

C++/WinRT
MIT License
1.64k stars 236 forks source link

Increase foldability of various templates #1338

Closed jonwis closed 1 year ago

jonwis commented 1 year ago

SizeBench revealed heavily non-foldability of certain C++/WinRT templates contributing to increased Windows binary sizes. Some small tweaks dropped the size of test.exe (release, x64) by ~132kb. (5180416 -> 5030400, or 2.8% ... which isn't a super lot, but it's nonzero.

Fixes #1334 and related

Changes

Enable nuspec iteration - Bump the default nuspec version to 3.0.0.0 so build_nuget.cmd and iterating with a local nuget source is easier.

Add solution files - Niceness for people who live in Visual Studio.

Fold delegate bases - Forced the layout of the delegate types to be regular by moving the reference count into a base before the non-ABI types. This folds nearly all the AddRef/QueryInterface instances into one. Releases are still unique because they call up into dtors.

Fold "report" in invoke - The invoke helper is a try/catch{report} but the body of "report" is identical on each invoke specialization. Move it to a unique non-inlineable method.

Fold event::add - Make the delegate first, then all the "expand the collection" instances are identical.

Further Work

SizeBench reports the following additional foldability wastage, quoted from the same test.exe:

Testing

Tested by build_and_test.cmd - all built, all passed

jonwis commented 1 year ago

For posterity, switching name_of<> and get_name<> to use std::wstring_view was basically a net wash. No test code compilation errors that would signal a breaking change, overall binary increased by about 512 bytes.

One area of interest is in GetRuntimeClassName. The inliner is super aggressive about splatting winrt::hstring(anything) inline (and ~hstring) rather than hoisting them out. In PGO-optimized binaries this ctor is normally de-inlined since it's replicated countless times throughout the code. Making a custom winrt::hstring get_runtime_class_name(std::wstring_view) would likely not really help, as PGO should automatically produce that that when used.

jonwis commented 1 year ago

It looks like I can get another 8kb by regularizing the tail of query_interface. Running total after that is ~140kb.

It also looks like the cluster around make_agile_delegate could be connected to a simplification on how delegates are generated. Delegate ctors have a lot of overloads - lambda, function, raw-pointer member, com_ptr member, weak_ref member, shared_ptr member, weak_ptr member. Under the covers they all route through the lambda form, doing various forms of "lock" and "call method." I wonder if there's a simplification of those to a two-form version. Unless we were willing to try some stronger type-erasure through a std::function-alike, I'm not sure this can be reduced further. It's probably another 3.2kb on this test binary, so under 0.5%.

jonwis commented 1 year ago

/azp run

azure-pipelines[bot] commented 1 year ago
No pipelines are associated with this pull request.
kennykerr commented 1 year ago

Overall looks like goodness. Just a few comments.

kennykerr commented 1 year ago

One area of interest is in GetRuntimeClassName. The inliner is super aggressive about splatting winrt::hstring(anything) inline (and ~hstring) rather than hoisting them out. In PGO-optimized binaries this ctor is normally de-inlined since it's replicated countless times throughout the code. Making a custom winrt::hstring get_runtime_class_name(std::wstring_view) would likely not really help, as PGO should automatically produce that that when used.

For windows-rs I managed to write a "constexpr" version of hstring so that it can be declared as a compile-time constant. This is mainly used for literals, but it also turns these GetRuntimeClassName into a cheap pointer copy. There are some limitations, so I'm not sure its super practical for cppwinrt to adopt though.

sylveon commented 1 year ago

Making the constructor of param::hstring constexpr seems pretty trivial, and it would enable constexpr fast-pass strings.

kennykerr commented 1 year ago

Making the constructor of param::hstring constexpr seems pretty trivial, and it would enable constexpr fast-pass strings.

I was referring to return values, which has a lot more optimizing value.