Closed jonwis closed 1 year ago
For posterity, switching name_of<>
and get_name<>
to use std::wstring_view
was basically a net wash. No test code compilation errors that would signal a breaking change, overall binary increased by about 512 bytes.
One area of interest is in GetRuntimeClassName
. The inliner is super aggressive about splatting winrt::hstring(anything)
inline (and ~hstring
) rather than hoisting them out. In PGO-optimized binaries this ctor is normally de-inlined since it's replicated countless times throughout the code. Making a custom winrt::hstring get_runtime_class_name(std::wstring_view)
would likely not really help, as PGO should automatically produce that that when used.
It looks like I can get another 8kb by regularizing the tail of query_interface
. Running total after that is ~140kb.
It also looks like the cluster around make_agile_delegate
could be connected to a simplification on how delegates are generated. Delegate ctors have a lot of overloads - lambda, function, raw-pointer member, com_ptr member, weak_ref member, shared_ptr member, weak_ptr member. Under the covers they all route through the lambda form, doing various forms of "lock" and "call method." I wonder if there's a simplification of those to a two-form version. Unless we were willing to try some stronger type-erasure through a std::function-alike, I'm not sure this can be reduced further. It's probably another 3.2kb on this test binary, so under 0.5%.
/azp run
Overall looks like goodness. Just a few comments.
One area of interest is in GetRuntimeClassName. The inliner is super aggressive about splatting winrt::hstring(anything) inline (and ~hstring) rather than hoisting them out. In PGO-optimized binaries this ctor is normally de-inlined since it's replicated countless times throughout the code. Making a custom winrt::hstring get_runtime_class_name(std::wstring_view) would likely not really help, as PGO should automatically produce that that when used.
For windows-rs I managed to write a "constexpr" version of hstring
so that it can be declared as a compile-time constant. This is mainly used for literals, but it also turns these GetRuntimeClassName
into a cheap pointer copy. There are some limitations, so I'm not sure its super practical for cppwinrt to adopt though.
Making the constructor of param::hstring
constexpr seems pretty trivial, and it would enable constexpr fast-pass strings.
Making the constructor of
param::hstring
constexpr seems pretty trivial, and it would enable constexpr fast-pass strings.
I was referring to return values, which has a lot more optimizing value.
SizeBench revealed heavily non-foldability of certain C++/WinRT templates contributing to increased Windows binary sizes. Some small tweaks dropped the size of
test.exe
(release, x64) by ~132kb. (5180416 -> 5030400, or 2.8% ... which isn't a super lot, but it's nonzero.Fixes #1334 and related
Changes
Enable nuspec iteration - Bump the default nuspec version to 3.0.0.0 so
build_nuget.cmd
and iterating with a local nuget source is easier.Add solution files - Niceness for people who live in Visual Studio.
Fold delegate bases - Forced the layout of the delegate types to be regular by moving the reference count into a base before the non-ABI types. This folds nearly all the AddRef/QueryInterface instances into one. Releases are still unique because they call up into dtors.
Fold "report" in invoke - The invoke helper is a try/catch{report} but the body of "report" is identical on each invoke specialization. Move it to a unique non-inlineable method.
Fold event::add - Make the delegate first, then all the "expand the collection" instances are identical.
Further Work
SizeBench reports the following additional foldability wastage, quoted from the same test.exe:
::Invoke
vtable and offset adjusters.make_agile_delegate
hstring
construction.Testing
Tested by
build_and_test.cmd
- all built, all passed