Open ldionne opened 3 days ago
The behavior is not the same. Suppose for example dest == first + 1
, then the first loop will copy *first
to the whole range. std::memcpy
over the whole range would be UB and std::memmove
would not have the same result.
The memcpy's do get combined if you restrict-qualify dest
.
Ah ah! Thanks both, that makes sense.
I can see that this gets optimized if I __restrict
the destination: https://godbolt.org/z/heb71docW
However, if I switch to memmove
, I don't get the same optimization (but GCC does it): https://godbolt.org/z/K8srPchTr
Is that one a missed optimization?
For memcpy overlapping ranges are UB, but memmove has to handle overlap:
The objects may overlap: copying takes place as if the characters were copied to a temporary character array and then the characters were copied from the array to dest.
I could not find if it is stated in the C++ standard. The above citation is from https://en.cppreference.com/w/cpp/string/byte/memmove
In C99 however, the difference is pretty clear:
void* memcpy( void *restrict dest, const void *restrict src, size_t count );
void* memmove( void* dest, const void* src, size_t count );
@b1ackviking I don't think that was unknown or up for discussion for anyone in this thread? Or I do not see how this is relevant to the remaining question?
@keinflue I thought my comment could be helpful as a possible explanation to why the compiler does not optimize memmove even in the second case, that's not obvious to me. Sorry, if it was inappropriate, didn't want to break in just to say something.
I noticed that the following code did not optimize to a single memcpy, unlike I would expect:
I would expect this to be equivalent to roughly:
Is this a problem with e.g. the lack of knowledge that the
[first, last)
range is all valid? Note that both GCC and Clang fail to perform this optimization.Godbolt: https://godbolt.org/z/zzdhcKPh4