llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.96k stars 11.53k forks source link

FR: clang-tidy (or off-by-default warning) for unconditional copy of const ref param #94798

Open pkasting opened 3 months ago

pkasting commented 3 months ago

Code like the following is a common source of extra copies in Chromium:

struct S {
  S(const T& t) : t(t) {}
  T t;
};

The following is generally better, as callers with lvalues will perform the same, but callers with rvalues will benefit if T has a move constructor:

struct S {
  S(T t) : t(std::move(t)) {}
  T t;
};

The generalization of this is: I'd like to find functions that that a const T& param and unconditionally copy-construct a T from it.

One downside I can think of is if T's move constructor is more expensive than a pointer copy, and all callers of this function themselves already had refs. At that point, fixing everyone up the chain to do moves at each call has costs compared to continually passing along refs.

This is sort of akin to -Wpessimizing-move (and so might be appropriate for an off-by-default warning), but it feels less clear-cut than that. I suspect a clang-tidy pass might be better.

I don't know what an escape hatch would look like for this in code. Perhaps seeing some real use cases where the above pattern is intentional would help.

PiotrZSL commented 3 months ago

Actually I currently facing many performance issues with:

struct S {
  S(T t) : t(std::move(t)) {}
  T t;
};

Problem is that where T is big (500 bytes+) you got 2 copies, one into parameter, and one from parameter. For code that you showing actually something like:

struct S {
  S(T&& t) : t(std::move(t)) {}
  T t;
};

would be best...

pkasting commented 3 months ago

Two moves, you mean? Yes, that's true, which is why I noted the bit about this being imperfect if T's move constructor isn't cheap.

The problem with S(T&&) is that if T is not a template type, this isn't perfect-forwarding, so the author must now provide both lvalue and rvalue overloads (or templatize). That's frequently more trouble than it's worth.

And if this is templated, there are still problems, since passing a bitfield to a perfect-forwarding method breaks: https://godbolt.org/z/xqnsx49Tf

Using a single by-value argument avoids these downsides and is "good enough" as a default choice. That said, we don't need to agree on what the right default change is -- any of these changes would silence my proposed diagnostic.

nicovank commented 3 months ago

I agree that a warning could be useful here, in the common case this is less code for near-zero performance difference. Passing by value allows the caller to decide whether to copy or move.

The problem is amplified when there is more than one parameter and therefore exponential const& and && overloads needed to cover every case optimally. Stuff I had written on the topic some time ago

I know this pattern is also pretty common at Facebook. Maybe Infer can catch it but I don't remember.

In the common case:

struct S { S(const T& t) : t(t) {}; T t; };
S s1(t); // One copy.
S s2(std::move(t)); // One copy.

struct S { S(T t) : t(std::move(t)) {}; T t; };
S s1(t); // One copy, one move.
S s2(std::move(t)); // Two moves -> this is better.

The only time it is (significantly) worse is when the object being passed is not trivially / cheaply movable. I have one case in mind where the destructor was explicitly defaulted (not respecting rule of 5), something like this. I wouldn't say this is common.