llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.8k stars 11.45k forks source link

Missed compiler optimization with member wise initialization #63535

Open martijnvels opened 1 year ago

martijnvels commented 1 year ago

The compiler does not optimize (coalesce) adjacent initializations when doing member-wise initialization.

See the example in https://gcc.godbolt.org/z/qnxxz8jh3

All 3 code alternatives basically implement a copy construct of an instance. This example is trivial and the default ctor would suffice, but in practice we often have cases where the default constructor does not suffice.

The poignant part is that the compiler does optimize a full instance memcpy for the default ctor, but does not do so for a member-wise init. The only alternative left for the code is to use alternative hand-rolled initialization (ALT). Our practical use case is protocol buffer generated message code where at scale, the cost of this can add up as throughput for hot data copies is limited mostly by stores / cycle for large, mostly trivial message data. (i.e. our example has 1 store vs 4 stores).

llvmbot commented 1 year ago

@llvm/issue-subscribers-c-1

llvmbot commented 1 year ago

@llvm/issue-subscribers-clang-codegen

topperc commented 1 year ago

I one problem here is the 4 byte gap between the last bool field and the next int64_t field.

martijnvels commented 1 year ago

Even if we remove all the gaps (https://gcc.godbolt.org/z/EsEPWf5eb), then we have the inefficient copy.

I explicitly included the default ctor to demonstrate that the compiler has no issue optimizing this to a full memcpy, i.e., the "legality" of the adjacent loads/stores being coalesced should not be a concern, the problem is that once you are forced into writing a constructor for some reason, you pay the price of not having adjacent trivial initializations not being coalesced.

It is also worth pointing out this is not a concern with (zero) initialization: https://gcc.godbolt.org/z/ccbvs5Yzv

But zero init also gets a bit weird if we allow a gap, including the defaulted ctor, which is inconsistent with the default copy ctor.... :) https://gcc.godbolt.org/z/18b3dEsYY

topperc commented 1 year ago

It looks like the default constructor is being generated as memcpy directly from the frontend. I'm not sure whether LLVM IR retains enough information from the source language to know it is ok to copy the gaps when optimizing.

martijnvels commented 1 year ago

Since I am mostly ignorant on where the IR / frontend boundaries should lay, is this something that should be fixed in the frontend? i.e., frontend generates memcpy for every trivial span where this has benefits. Or something to be optimized post FE as the IR coalescing adjacent load/stores. The latter has obviously a higher complexity as the IR may not have any info on sparse members, gaps or the concept of 'adjacent member initialization' being the IR code at hand.