microsoft / cppwinrt

C++/WinRT
MIT License
1.64k stars 236 forks source link

Bug: implements_delegate non-fold-ability #1334

Closed jonwis closed 1 year ago

jonwis commented 1 year ago

Version

2.0.220131.2

Summary

Using SizeBench, some reasonable fraction of our DLL is taken up in winrt::impl::implements_delegate methods that aren't foldable for one of two reasons:

Reproducible example

No response

Expected behavior

It may be a linker/optimizer bug that the folder is not working "hard enough."

Actual behavior

Multiple nonfolded specializations of delegate types taking ~1% of binary space (~15k out of ~1500k).

Additional comments

In other places, I've used a "base" type to move as many things down to non-templated members as possible, with a templated caller sending in a parameters. For instance, in base.h, this appears to work and collapse all those methods into a small handful of specialized "push some parameters and call a helper" method.

Things that could use a similar treatment:

In my 1.5mb DLL, SizeBench claims those three waste another 1%.

namespace winrt::impl
{
#if defined(_MSC_VER)
#pragma warning(push)
#pragma warning(disable:4458) // declaration hides class member (okay because we do not use named members of base class)
#endif

    struct implements_delegate_base
    {
        WINRT_IMPL_NOINLINE uint32_t __stdcall AddRef() noexcept
        {
            return ++m_references;
        }

        WINRT_IMPL_NOINLINE uint32_t __stdcall Release() noexcept
        {
            return --m_references;
        }

        WINRT_IMPL_NOINLINE uint32_t QueryInterfaceCommon(guid const& id, void** result, unknown_abi* abi_t_ptr, guid const& delegate_id) noexcept
        {
            if ((delegate_id == id) || is_guid_of<Windows::Foundation::IUnknown>(id) || is_guid_of<IAgileObject>(id))
            {
                *result = abi_t_ptr;
                AddRef();
                return 0;
            }

            if (is_guid_of<IMarshal>(id))
            {
                return make_marshaler(abi_t_ptr, result);
            }

            *result = nullptr;
            return error_no_interface;
        }

    public:
        atomic_ref_count m_references{ 1 };
    };

    template <typename T, typename H>
    struct implements_delegate : implements_delegate_base, abi_t<T>, H, update_module_lock
    {
        implements_delegate(H&& handler) : H(std::forward<H>(handler))
        {
        }

        int32_t __stdcall QueryInterface(guid const& id, void** result) noexcept final { return implements_delegate_base::QueryInterfaceCommon(id, result, static_cast<abi_t<T>*>(this), guid_of<T>()); }

        uint32_t __stdcall AddRef() noexcept final { return implements_delegate_base::AddRef(); }

        uint32_t __stdcall Release() noexcept final
        {
            auto remaining = implements_delegate_base::Release();
            if (remaining == 0)
            {
                delete static_cast<delegate<T, H>*>(this);
            }
            return remaining;
        }
    };
kennykerr commented 1 year ago

Yep we did a lot of work manually folding the implements class template a few years back - which by comparison was a much larger offender - but implements_delegate seems worthwhile as well. The guid_of one is a longstanding problem that the C++ compiler's constexpr support could really help with if it got a bit better at folding those but I'm not sure we can fix that directly. I know that would make @dmachaj very happy. 😏

kennykerr commented 1 year ago

It's a bit of duct tape, but if you happen to know, for example, the top N specializations of guid_of for generic interfaces across the OS then you could take those and precompute them and append them to the end of base.h for razzle builds. That would then avoid all of the overhead of those specializations. Obviously, no need to do that for non-generic interfaces.

jonwis commented 1 year ago

Sizebench helped me find other things that could be inline constexpr as duplicate values, so that was nice. I'm not seeing duplicate foldable data-value comdats, just more "you need a specialization of this delegate for every delegate-typed thing you use" ... I had an experiment at one point that showed that basically all delegates of certain shapes (takes 1-2 iunknown-derived things) can be implemented by a type that cheats and does type-erasure while calling through the Invoke method. Once I get these reliably folding, the actual size of unique implements_delegate should be very low - just the size of each QueryInterface and the type's vtable. And it should be related to the # of unique combinations of Invoke(<pointer|scalar|struct>*) interfaces, which itself is relatively low.

dmachaj commented 1 year ago

The last time I looked for that kind of duplication with SizeBench was #921 (April 2021). There was some easy low-hanging fruit back then (small stuff that didn't add up to a ton, though).

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 10 days with no activity. Remove stale label or comment or this will be closed in 5 days.

jonwis commented 1 year ago

Geez bot I've got a bunch of things to do. This is on my list.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 10 days with no activity. Remove stale label or comment or this will be closed in 5 days.