llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.02k stars 11.96k forks source link

The performance of vector::assign is 100 times worse than that of libstdc++. #48237

Closed llvmbot closed 9 months ago

llvmbot commented 3 years ago
Bugzilla Link 48893
Version 8.0
OS Windows NT
Attachments the testcase
Reporter LLVM Bugzilla Contributor
CC @dwblaikie,@mclow,@yuanfang-chen

Extended Description

For details about test cases, see the attachment.

./gccO0.out
copy 1610957372.518706
copy1 1610957372.526591
7885

./clangO0.out
copy 1610957377.298682
copy1 1610957378.455563
1156881

./gccO2.out
copy 1610957396.558745
copy1 1610957396.566606
7861

./clangO2.out
copy 1610957407.81578
copy1 1610957407.179067
97489

the test function

int main(int argc, char* argv[])
{
    std::vector<uint8_t> data;
    data.resize(1024UL*1024*24);
    std::vector<uint8_t> data1;
    struct timeval tv,tv1;
    gettimeofday(&tv, NULL);
  data1.assign(data.begin(), data.end());
    gettimeofday(&tv1, NULL);
    std::cout << "copy " << tv.tv_sec <<"." << tv.tv_usec << std::endl;
    std::cout << "copy1 " << tv1.tv_sec <<"." << tv1.tv_usec << std::endl;
  std::cout << tv1.tv_sec * 1000000 + tv1.tv_usec - (tv.tv_sec*1000000 + tv.tv_usec) << std::endl;
    return 0;
}

The main function consumed is construct_range_forward. libcxx use the construct_range_forward Each element is constructed using the construct method, which is constructed using the placement new method. GCC uses std::uninitialized_copy to copy the memory. The performance is significantly better than that of libcxx.

llvmbot commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#49324

llvmbot commented 3 years ago

Maybe I can change the __construct_at_end call to std::copy?

copy is for copying values over already-constructed objects; not for creating new objects.

for the second suggestion, you're adding an assumption that the source data is contiguous in memory. That's not always true.

I see, but __construct_range_forward does not distinguish between contiguous memory and discrete memory copies for iterators, which causes a serious performance deterioration, can I use std::uninitialized_copy. Or how can I optimize __construct_range_forward?

mclow commented 3 years ago

Maybe I can change the __construct_at_end call to std::copy?

copy is for copying values over already-constructed objects; not for creating new objects.

for the second suggestion, you're adding an assumption that the source data is contiguous in memory. That's not always true.

llvmbot commented 3 years ago

I don't understand why __construct_at_end is called. When (__new_size <= capacity(), we use _VSTD::copy(__first, __mid, this->__begin_);, which is better than __construct_at_end. Maybe I can change the __construct_at_end call to std::copy?

or: __construct_range_forward has two reloaded versions. I hope that the second memcpy version can be invoked. However, the second version accepts only parameters of the pointer type and does not accept iterators

Can I modify the code as follows?

from

vector<_Tp, _Allocator>::__construct_at_end(_ForwardIterator __first, _ForwardIterator __last, size_type __n)
{
    allocator_type& __a = this->__alloc();
    __RAII_IncreaseAnnotator __annotator(*this, __n);
    __alloc_traits::__construct_range_forward(__a, __first, __last, this->__end_);
    __annotator.__done();
}

to

vector<_Tp, _Allocator>::__construct_at_end(_ForwardIterator __first, _ForwardIterator __last, size_type __n)
{
    allocator_type& __a = this->__alloc();
    __RAII_IncreaseAnnotator __annotator(*this, __n);
    __alloc_traits::__construct_range_forward(__a, &*__first, &*__last, this->__end_);
    __annotator.__done();
}
llvmbot commented 3 years ago

Created attachment 24423 [details] the testcase

For details about test cases, see the attachment.

./gccO0.out copy 1610957372.518706 copy1 1610957372.526591 7885

./clangO0.out copy 1610957377.298682 copy1 1610957378.455563 1156881

./gccO2.out copy 1610957396.558745 copy1 1610957396.566606 7861

./clangO2.out copy 1610957407.81578 copy1 1610957407.179067 97489

the test function

int main(int argc, char argv[]) { std::vector data; data.resize(1024UL102424); std::vector data1; struct timeval tv,tv1; gettimeofday(&tv, NULL); data1.assign(data.begin(), data.end()); gettimeofday(&tv1, NULL); std::cout << "copy " << tv.tv_sec <<"." << tv.tv_usec << std::endl; std::cout << "copy1 " << tv1.tv_sec <<"." << tv1.tv_usec << std::endl; std::cout << tv1.tv_sec 1000000 + tv1.tv_usec - (tv.tv_sec*1000000 + tv.tv_usec) << std::endl; return 0; }

The main function consumed is construct_range_forward. libcxx use the construct_range_forward Each element is constructed using the construct method, which is constructed using the placement new method. GCC uses std::uninitialized_copy to copy the memory. The performance is significantly better than that of libcxx.

The test is performed on the Aarch64 platform.

philnik777 commented 9 months ago

I've just tested the original benchmark on my system and I got 12230 for libstdc++ and 12172 for libc++. Given the data, this seems to be fixed.