Closed tstenner closed 2 years ago
Some real-world data: with the changes above, the compiler optimized/inlined so much a sampling profiler couldn't detect any of the calls between stream_outlet_impl::enqueue
and the AVX accelerated memmove instructions.
A sample has a
char*
array to store the values, but the size and type isn't known at compile time. The values are iterated over in a cast-happy C++98 for-loop:for (std::string *p = (std::string *)&data_, *e = p + num_channels_; p < e; ++p)
. Also, sample data can be copied in / out in formats different from the native sample format.In order to simplify the code, this PR adds a helper class and helper methods, so
for (std::string *p = (std::string *)&data_, *e = p + num_channels_; p < e; ++p) { *p = …; }
becomesfor (auto &val : samplevals<std::string>(*this)) { val = …;}
. It also adds two methods:conv_from(T* src)
copies the values (converting types if necessary) into the sample data,conv_into(T* dst)
copies the sample data intodst
(again, converting the types if necessary). The optimal copying / conversion strategy is found via template overloading.Even though the implementation is fully in
sample.cpp
instead ofsample.h
, the throughput in a benchmark improved by ~3%.