Closed llvmbot closed 9 months ago
mentioned in issue llvm/llvm-bugzilla-archive#49324
Maybe I can change the __construct_at_end call to std::copy?
copy is for copying values over already-constructed objects; not for creating new objects.
for the second suggestion, you're adding an assumption that the source data is contiguous in memory. That's not always true.
I see, but __construct_range_forward
does not distinguish between contiguous memory and discrete memory copies for iterators, which causes a serious performance deterioration, can I use std::uninitialized_copy
. Or how can I optimize __construct_range_forward
?
Maybe I can change the __construct_at_end call to std::copy?
copy is for copying values over already-constructed objects; not for creating new objects.
for the second suggestion, you're adding an assumption that the source data is contiguous in memory. That's not always true.
I don't understand why __construct_at_end
is called. When (__new_size <= capacity()
, we use _VSTD::copy(__first, __mid, this->__begin_);
, which is better than __construct_at_end
.
Maybe I can change the __construct_at_end
call to std::copy
?
or: __construct_range_forward
has two reloaded versions. I hope that the second memcpy version can be invoked. However, the second version accepts only parameters of the pointer type and does not accept iterators
Can I modify the code as follows?
from
vector<_Tp, _Allocator>::__construct_at_end(_ForwardIterator __first, _ForwardIterator __last, size_type __n)
{
allocator_type& __a = this->__alloc();
__RAII_IncreaseAnnotator __annotator(*this, __n);
__alloc_traits::__construct_range_forward(__a, __first, __last, this->__end_);
__annotator.__done();
}
to
vector<_Tp, _Allocator>::__construct_at_end(_ForwardIterator __first, _ForwardIterator __last, size_type __n)
{
allocator_type& __a = this->__alloc();
__RAII_IncreaseAnnotator __annotator(*this, __n);
__alloc_traits::__construct_range_forward(__a, &*__first, &*__last, this->__end_);
__annotator.__done();
}
Created attachment 24423 [details] the testcase
For details about test cases, see the attachment.
./gccO0.out copy 1610957372.518706 copy1 1610957372.526591 7885
./clangO0.out copy 1610957377.298682 copy1 1610957378.455563 1156881
./gccO2.out copy 1610957396.558745 copy1 1610957396.566606 7861
./clangO2.out copy 1610957407.81578 copy1 1610957407.179067 97489
the test function
int main(int argc, char argv[]) { std::vector
data; data.resize(1024UL 102424); std::vectordata1; struct timeval tv,tv1; gettimeofday(&tv, NULL); data1.assign(data.begin(), data.end()); gettimeofday(&tv1, NULL); std::cout << "copy " << tv.tv_sec <<"." << tv.tv_usec << std::endl; std::cout << "copy1 " << tv1.tv_sec <<"." << tv1.tv_usec << std::endl; std::cout << tv1.tv_sec 1000000 + tv1.tv_usec - (tv.tv_sec*1000000 + tv.tv_usec) << std::endl; return 0; }The main function consumed is construct_range_forward. libcxx use the construct_range_forward Each element is constructed using the construct method, which is constructed using the placement new method. GCC uses std::uninitialized_copy to copy the memory. The performance is significantly better than that of libcxx.
The test is performed on the Aarch64 platform.
I've just tested the original benchmark on my system and I got 12230 for libstdc++ and 12172 for libc++. Given the data, this seems to be fixed.
Extended Description
For details about test cases, see the attachment.
the test function
The main function consumed is construct_range_forward. libcxx use the construct_range_forward Each element is constructed using the construct method, which is constructed using the placement new method. GCC uses std::uninitialized_copy to copy the memory. The performance is significantly better than that of libcxx.