Possible missed optimization when calling memcpy or memmove in a loop

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

http://llvm.org

Other

29.27k stars 12.1k forks source link

Possible missed optimization when calling memcpy or memmove in a loop #117332

Open ldionne opened 3 days ago

ldionne commented 3 days ago

I noticed that the following code did not optimize to a single memcpy, unlike I would expect:

template <class T>
void relocate_1(T *first, T *last, T *dest) {
    for ( ; first != last; ++first, ++dest) {
        std::memcpy((void*)dest, first, sizeof(T));
    }
}

I would expect this to be equivalent to roughly:

template <class T>
void relocate_2(T *first, T *last, T *dest) {
    auto n = last - first;
    std::memcpy((void*)dest, first, n);
}

Is this a problem with e.g. the lack of knowledge that the [first, last) range is all valid? Note that both GCC and Clang fail to perform this optimization.

Godbolt: https://godbolt.org/z/zzdhcKPh4

keinflue commented 3 days ago

The behavior is not the same. Suppose for example dest == first + 1, then the first loop will copy *first to the whole range. std::memcpy over the whole range would be UB and std::memmove would not have the same result.

nikic commented 3 days ago

The memcpy's do get combined if you restrict-qualify dest.

ldionne commented 3 days ago

Ah ah! Thanks both, that makes sense.

I can see that this gets optimized if I __restrict the destination: https://godbolt.org/z/heb71docW

However, if I switch to memmove, I don't get the same optimization (but GCC does it): https://godbolt.org/z/K8srPchTr Is that one a missed optimization?

b1ackviking commented 3 days ago

For memcpy overlapping ranges are UB, but memmove has to handle overlap:

The objects may overlap: copying takes place as if the characters were copied to a temporary character array and then the characters were copied from the array to dest.

I could not find if it is stated in the C++ standard. The above citation is from https://en.cppreference.com/w/cpp/string/byte/memmove

In C99 however, the difference is pretty clear:

void* memcpy( void *restrict dest, const void *restrict src, size_t count );
void* memmove( void* dest, const void* src, size_t count );

keinflue commented 2 days ago

@b1ackviking I don't think that was unknown or up for discussion for anyone in this thread? Or I do not see how this is relevant to the remaining question?

b1ackviking commented 2 days ago

@keinflue I thought my comment could be helpful as a possible explanation to why the compiler does not optimize memmove even in the second case, that's not obvious to me. Sorry, if it was inappropriate, didn't want to break in just to say something.