scylladb / seastar

High performance server-side application framework
http://seastar.io
Apache License 2.0
8.33k stars 1.54k forks source link

sstring causes interoperability problems with other libraries #634

Open nyh opened 5 years ago

nyh commented 5 years ago

The reason why seastar has its own "sstring" type instead of using std::string is historical, due to C++ library implementation when the Seastar projects started: In gcc before version 5, std::string used to use atomic operations which are bad for many-core applications like Seastar. As the nice blog post https://shaharmike.com/cpp/std-string/ explains why this was the case in pre-C++-11 and why gcc for ABI backward-compatibility reasons left this implementation long after 2011. But since gcc 5, this old implementation is gone. Today's std::string implementations in gcc and clang use exactly the same techniques as seastar::sstring does: no atomic operations, and in-structure storage for small strings (a.k.a. "small-string optimization") - the aforementioned blog points out that clang's implementation is particularly clever.

For this reason, seastar::sstring is no longer needed and we can switch to using std::string.

Since Seastar does not care about ABI compatibility (we do not yet support a Seastar shared library - see issue #467), and only about source-code compatibility, it should be easy to change Seastar so that seastar::sstring is merely an alias to std::string, without breaking compatibility for existing applications. Applications can then start gradually changing their own code to use std::string instead of seastar::sstring.

nyh commented 5 years ago

@avikivity pointed out in the mailing list, "I don't think it's simple. The interfaces are not identical, sstring has several features beyond std::string.".

One example I found is the null-termination template parameter introduced in commit 7328d17337523ae1e64dbd07cc5aa14628a022f0 which std::basic_string doesn't support, but seastar::basic_sstring does.

nyh commented 5 years ago

Another difference is that seastar::sstring has a constructor taking sstring::initialized_later() and a size, which std::string doesn't have. This constructor allows creating an uninitialized string of a certain length, and only then fill its data. We also have a convenience function make_sstring() to take a bunch of arguments and concatenate them into one new string. The closest constructor that std::string has is one taking an iterator pair. It's not as general as make_sstring() allowing parameters of different type, but I don't think this is a critical feature (of course there's the issue of backward compatibility, though....).

Yet another feature we have and std::string doesn't is sstring::release(), introduced in commit a2ca5568369c2f782477de890b468b0650ac2a33. This allows taking out the internal buffer held by sstring so it can be held by another owner without copying.

nyh commented 5 years ago

Another gratuitous difference is that seastar::sstring::begin() returns a bare character pointer, while std::string::begin() returns an std::string::iterator. This breaks for example a call like seastar::output_stream::write(s.begin(), s.size()) because s.begin() is not a char* as write() expects.

avikivity commented 5 years ago

That's really a problem in output_stream::write(), it should support iterator+len and iterator pair.

nyh commented 1 year ago

In commit fc0750bf5c0e16abc8423f38a88118c3640ca75d, a compile-time flag SEASTAR_SSTRING was added. Our cmake sets it by default, but if not set, seastar::sstring becomes just an alias for std::string. This doesn't solve any of the above differences in the APIs, but should make it easier to check what happens if we were to do that conversion.

nyh commented 8 months ago

Another difference is that seastar::sstring has a constructor taking sstring::initialized_later() and a size, which std::string doesn't have. This constructor allows creating an uninitialized string of a certain length, and only then fill its data.

I just noticed that although std::string doesn't have a way to create an uninitialized string with a certain length and fill it later, it does a push_back() method (which we are lacking, see https://github.com/scylladb/seastar/issues/2104) which is guaranteed to have (amortized) complexity O(1), so one can create a string of length N in O(N) by incrementally calling push_back().

Even better, std::string also has a reserve() method (which I believe we're also lacking!) which will make such a loop even more efficient. Even more efficient can be to use append() or the new C++23 append_range() after a reserve() instead of doing push_back() one character at a time.

I don't think that sstring::initialized_later() gives us anything in features or performance that these standard solutions can't.

dawmd commented 1 day ago

If some of std::string's member functions encourage bad performance (see https://github.com/scylladb/seastar/issues/2104#issuecomment-1951350042), how about protected-inheriting from std::string and only exposing the functions we want to expose? This way we wouldn't have to implement most things on our own. We'd also be sure that the semantics of seastar::sstring aligns with the semantics of std::string. I'm not really aware of all of the problems that making seastar::sstring an alias for std::string causes, but just throwing a casual idea.