Open jonesmz opened 4 years ago
Hi, Michael!
Thanks for the excellent report. Yes, you right. It makes an additional memory allocation penalty while processing an error message from the error code. It should be the relatively unlikely case since, in general, error message handling is not the primary control flow of an application. But anyway the inefficiency of the operation takes place.
It may be improved with a very similar solution to the one by the link you'd provided:
namespace detail {
template <typename ...Ts>
inline std::string concat_str(const Ts& ...vs) {
const auto parts = hana::make_tuple(std::string_view{vs}...);
const auto size = hana::fold(hana::transform(parts,[](auto p) { return size(p);}), 0, hana::plus);
std::string out;
out.reserve(size);
hana::for_each(parts, [&out](auto p) { out.append(data(p), size(p)); });
return out;
}
} // namespace detail
//...
#define _OZO_SQLSTATE_NAME(value)\
case value: return detail::concat_str(#value, "(", detail::ltob36(value), ")");
My thoughts were about the compile-time version of detail::ltob36()
in combination with boost::hana::string
, so in that case all calculations should be in a compile-time, there will be only one allocation and copy of raw buffer of chars into std::string
. The only question is about compilation time with such a massive usage of boost::hana::string
.
namespace detail {
template <typename ...Ts>
constexpr auto concat_str(const Ts& ...vs) {
constexpr auto parts = hana::make_tuple(vs...);
return hana::fold(parts, BOOST_HANA_STRING(""), hana::plus);
}
} // namespace detail
//...
#define _OZO_SQLSTATE_NAME(value)\
case value:\
return hana::to<const char*>(detail::concat_str(BOOST_HANA_STRING(#value),\
BOOST_HANA_STRING("("), detail::ltob36(value), BOOST_HANA_STRING(")")));
Unfortunately, I have not enough time to make a research on the best solution. So if you are interested in an efficient solution to the problem it would be really helpful to have a PR from you.
Best regards, Sergei
For strings that are 100% known at compile time, something like this can be used, which avoids using Boost.Hana and all of the associated template machinery.
namespace detail
{
template<typename CHAR_T, CHAR_T ... CHARS>
struct BasicMetaString : public std::basic_string_view<CHAR_T>
{
constexpr BasicMetaString(void)
: std::basic_string_view<CHAR_T>(chars, sizeof...(CHARS))
{ }
~BasicMetaString(void) = default;
// Basically a no-op, since every instantiation of BasicMetaString is
// globally unique, since it's data is part of it's type.
// Needed to work around some weird situations that can cause a compiler error.
// May not be needed in OZO.
BasicMetaString(BasicMetaString const&) = default;
BasicMetaString(BasicMetaString &&) = delete;
BasicMetaString& operator=(BasicMetaString &&) = delete;
BasicMetaString& operator=(BasicMetaString const&) = delete;
private:
static constexpr CHAR_T chars[] = { CHARS..., std::char_traits<CHAR_T>::to_char_type(0) };
}; // struct BasicMetaString
//-----------------------------------------------------------------------------
template<char ... CHARS>
using MetaString = BasicMetaString<char, CHARS...>;
//-----------------------------------------------------------------------------
template<typename CHAR_T, CHAR_T ... LEFT, CHAR_T ... RIGHT>
constexpr BasicMetaString<CHAR_T, LEFT..., RIGHT...> operator+(BasicMetaString<CHAR_T, LEFT...>, BasicMetaString<CHAR_T, RIGHT...>)
{
return {};
} // operator+()
//-----------------------------------------------------------------------------
inline namespace literals
{
/*
* Be Warned!
*
* This is only supported as an extension to the C++ language by the GCC compiler (also supported by clang)!
*
* Switching compilers, will necessitate the use of an alternative mechanism.
*/
template<typename CHAR_T, CHAR_T... CHARS>
constexpr BasicMetaString<CHAR_T, CHARS...> operator "" _metastr()
{
return {};
} // operator ""()
} // inline namespace literals
decltype(auto) compile_time_ltob36(long i)
{
// TODO: return some algorithm to convert long to string at compile time.
}
} // namespace detail
#define _OZO_SQLSTATE_NAME(value) \
case value: return #value_metastr + "("_metastr + compile_time_ltob36(value) + ")"_metastr;
FWIW I have a detail
implementation of a string_concat
that joins string-like objects in a single allocation.
Implementation: https://github.com/ricejasonf/nbdl/blob/de5d481b1ae5e0ec1a8e340ef96bd481205753be/include/nbdl/detail/string_concat.hpp
EDIT: It could probably be improved to use a fold expression to avoid creating a tuple for the input.
There's also this, though it's not really something that can be used at this time.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1228r1.html
@jonesmz Michael, there is no need to avoid Boost.Hana, because the library already uses it. The solution you quoted is similar to hana::string
, so using UDL and throwing away the folding algorithm Hana-based solution will be:
using namespace hana::literals;
#define _OZO_SQLSTATE_NAME(value) \
case value: return hana::to<const char*>(#value##_s + "("_s + compile_time_ltob36(value) + ")"_s);
The main problem here is compiler and how it handles such amount of compile-time strings. Compile-time strings are the right things, but they consume compiler memory much more than they seem to do. There was a talk at CppCon about UDL and constexpr std::string_view
, and the numbers were unexpected (at least for me). And, as far as I understand, the worst thing is that each time when the compiler process this header, it creates all the specializations in the AST that consume a lot of memory. So this solution is attractive like a compile-time computation, but I'm worrying about side-effects on compilation time and compiler memory consumption. We have such aggressive usage of compile-time strings in definitions.h, and it seems like this thing consumes a lot of memory during compilation (but I'm not sure since made no measurement). So compile-time strings solution is good but it needs a measurement of compilation time and memory consumption.
@ricejasonf Jason thanks a lot for the participation! What are benefits to use std::array
instead of boost::hana::tuple
and fold expression instead of boost::hana::fold
?
@thed636 Sure, all that sounds good. I was only pointing out alternative approaches.
@ricejasonf Jason thanks a lot for the participation! What are benefits to use
std::array
instead ofboost::hana::tuple
and fold expression instead ofboost::hana::fold
?
Generally speaking, std::array
is preferred when possible to avoid large complex types, and they work well with homogeneously typed algorithms which compile faster.
Fold expressions can avoid creating intermediate tuples which can involve copy/move operations on each element.
boost::hana::fold
is sometimes required, but it is very slow to compile because it creates a lot of type garbage, instantiating a function template on each element with an intermediate state object which likely has a different type on each iteration. Prefer unpack
or for_each
when possible.
Well, at the very least, we can change the line from:
#define __OZO_SQLSTATE_NAME(value) case value: return std::string(#value) + "(" + detail::ltob36(value) + ")";
To
#define OZO_SQLSTATE_NAME(value) case value: return std::string(#value "(") + detail::ltob36(value) + ")";
Which will reduce the number of strings being concatenated by one.
Similarly, the boost hana version would look like:
using namespace hana::literals;
#define _OZO_SQLSTATE_NAME(value) \
case value: return hana::to<const char*>(#value "("_s + compile_time_ltob36(value) + ")"_s);
The C++ language performs this string concatenation like this:
Meaning that for every
operator+
call, an allocation and a copy of the previous string data, plus the new string data, occurs.In this case, we experience at least 3 allocations (one each for value1, value2, and value3), and three copies of
#value
See this file: https://github.com/splinter-build/splinter/blob/10-string_concat/src/string_piece_util.h
The
string_concat
(Very bottom of file) function will accomplish this string concatenation operation with a single allocation, and a single copy of each byte.All of the functions in that file starting with
pairwise_for_each
and below (The part of the file abovepairwise_for_each
are not mine), are copyright myself, and are available to OZO under the PostgreSQL license.