martinmoene / string-view-lite

string_view lite - A C++17-like string_view for C++98, C++11 and later in a single-file header-only library
Boost Software License 1.0
421 stars 44 forks source link

Non constexpr operator== #34

Closed mcskatkat closed 4 years ago

mcskatkat commented 4 years ago

First of all, great work. This is probably the most thorough and streamlined backport of string_view publicly available, and I thank you for making it such.

I'm using it to replace Google's Abseil implementation, which still has some rough edges, and certainly drags in a lot of baggage for someone who just needs string_view.

One of my biggest motivations for using string_view is to facilitate string keyed associative containers with compile-time lookup, which opens the door to many useful things with zero runtime overhead. However, this requires constexpr comparison to be available... This string_view delegates operator== to char_traits::compare() which is part of std, and as such is required NOT to be marked constexpr in pre-C++17 compilation. This is despite the compiler being perfectly able to perform constexpr comparisons.

In my case then, there is little value in backporting string_view if it doesn't backport its benefits too.

My workaround solution for this was deriving my own char_traits to implement a constexpr compare(), and then deriving my own string_view, as shown below. Perhaps you can think of a better way to backport constexpr comparison.

Thx

namespace my_own {
    namespace _detail {
        struct char_traits : public std::char_traits<char> {
            static constexpr int compare( const char_type* s1, const char_type* s2, std::size_t count ) {
                ... compiler specific ...
            }
        };
        using plain_string_view = nonstd::basic_string_view<char>;
    }
    class string_view : public nonstd::basic_string_view<char, _detail::char_traits> {
        using Base = nonstd::basic_string_view<char, _detail::char_traits>;
    public:
        using Base::Base;
        // need to provide own implicit conversions with std::string, since it uses the default char_traits
        operator string() const noexcept { return string(data(), size()); }
        string_view(string const & s) noexcept : Base(s.data(), s.size()) {}
        // provide implicit conversions with a string view that uses standard char_traits
        constexpr operator _detail::plain_string_view() const {return {data(), size()};}
        constexpr string_view(_detail::plain_string_view p) : Base(p.data(), p.size()) {}
    };
    inline std::ostream & operator<<(std::ostream & os, string_view sv) { return os << _detail::plain_string_view(sv); }
} // namespace
martinmoene commented 4 years ago

Hi @mcskatkat Thanks for your kind words. Also thanks for the interesting suggestion, i'll look into it.

martinmoene commented 4 years ago

Hi @mcskatkat can you have a look at above commit?

mcskatkat commented 4 years ago

Hi. Thanks for the quick response. This might work but there are two things about it:

  1. You definitely want to use compiler builtin function to implement the comparison itself. It's very easy to get to thousands of comparisons made at compile time and strings may be long. Compilation might take an unnecessarily long time if implemented character by character. Note that compile time code execution is usually way slower than that of equivalent compiled code at run time. To that end, GCC provides __builtin_memcmp, and AFAIK MSVC and clang implement plain memcmp() calls as intrinsics, and all of them are constexpr. You can look at how Abseil handled this, or how the std lib sources handle it for C++17/20. BTW the same applies for strlen, which in your code is currently implemented character by character. I think it's best to leave the character by character implementation for cases where the compiler isn't identified / known.

  2. I don't see the benefit of deriving CTraits from Traits if its only use is a single static function that's being called directly by fully qualified name. Such a function could just as well be a protected static member of string_view. The derivation was helpful for me trying to work around this from outside the class without modifying it, and it gave me an extra headache to reconcile the specialized string_view with the classes that use the default traits.

mcskatkat commented 4 years ago

Afterthought:

All of the above only applies when the element type is char, however that is by far the most common case. So I would consider creating a protected static compare() in string_view with a specialization for char that delegates to compiler intrinsics if available. In all other cases fall back to the current element-by-element generic implementation.

martinmoene commented 4 years ago

Thanks for your notes, it is very helpful to get this steering based on actual use cases, drawing attention to weak spots and offering pointers to possible improvements.

I'll take it further with your suggestions.

mcskatkat commented 4 years ago

I edited my previous comment and removed incorrect items that resulted from me being lazy and only looking at the commit diff, missing the rest of the code.

martinmoene commented 4 years ago

I have the impression that builtins such as memcmp() and __builtin_memcmp(), __builtin_strlen() cannot be used in a constexpr context with g++ or msvc, like:

constexpr char hello[] = "hello";
constexpr char world[] = "world";

static_assert( string_view( hello ).compare( string_view( world ) ) == 0, "" );

See e.g. SO question Why does constexpr context make the compiler fail, while w/o it optimizes perfectly?.

On godbolt:

martinmoene commented 4 years ago

@mcskatkat I've merged above to master. I'm curious what you think of it, if there's more to it.

mcskatkat commented 4 years ago

Yes, it looks good. There's a large matrix of compiler-type / compiler-version / C++-standard that needs to be tested, it's hard to tell from looking at the code whether the maze of #if and macro expansions cover it correctly.

I tried this with my code after removing my previous workaround. It compiles ok, but I'm not getting __builtins (it's using the generic code). I am using GCC 8 with C++14 and I know that the Abseil code I was using previously had __builtins and worked fine. I'm also certain that GCC 7 has them and probably earlier versions too, but I'm not sure when they started being constexpr. BTW I'm quite certain that any compiler that has __builtin_memcmp also has such strlen. So until one that doesn't is encountered, I think it's safe to define one nssv_HAVE_BUILTIN in terms of the other (but keep separate macros).

Unrelated, I think it's better if you merge the two identical, generic length() implementations (constexpr and non-), using macro expansion instead of #if. This will avoid code duplication.

martinmoene commented 4 years ago

I tried this with my code after removing my previous workaround. It compiles ok, but I'm not getting builtins (it's using the generic code). I am using GCC 8 with C++14 and I know that the Abseil code I was using previously had builtins and worked fine.

Can you show your try-code?

As far as I can tell, the following does not compile constexpr-ly with Abseil under gcc, or with Abseil under msvc:

constexpr char hello[] = "hello";
constexpr char world[] = "world";

static_assert( absl::string_view( hello ).compare( absl::string_view( world ) ) < 0, "" );

What am i missing?

mcskatkat commented 4 years ago
> ls -l
total 0
> g++ --version
g++ (Ubuntu 8.4.0-1ubuntu1~18.04) 8.4.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

> cat | g++ -I$ABSL_PATH -std=gnu++14 -x c++ -c -
#include <absl/strings/string_view.h>

constexpr char hello[] = "hello";
constexpr char world[] = "world";

static_assert( absl::string_view( hello ).compare( absl::string_view( world ) ) < 0, "" );

> ls -l
total 4
-rw-rw-r-- 1 ubuntu ubuntu 1152 Oct  2 22:34 -.o
> 

Perhaps you are using an older version of absl. I know for sure that past versions were missing several constexpr annotations.

mcskatkat commented 4 years ago

Here are some excerpts from absl/strings/string_view.h and absl/base/config.h that I'm using (AFAIK they don't really do versioning):

constexpr bool operator==(string_view x, string_view y) noexcept {
  return x.size() == y.size() &&
         (x.empty() ||
          ABSL_INTERNAL_STRING_VIEW_MEMCMP(x.data(), y.data(), x.size()) == 0);
}
#if ABSL_HAVE_BUILTIN(__builtin_memcmp) || \
    (defined(__GNUC__) && !defined(__clang__))
#define ABSL_INTERNAL_STRING_VIEW_MEMCMP __builtin_memcmp
#else  // ABSL_HAVE_BUILTIN(__builtin_memcmp)
#define ABSL_INTERNAL_STRING_VIEW_MEMCMP memcmp
#endif  // ABSL_HAVE_BUILTIN(__builtin_memcmp)
#ifdef __has_builtin
#define ABSL_HAVE_BUILTIN(x) __has_builtin(x)
#else
#define ABSL_HAVE_BUILTIN(x) 0
#endif
mcskatkat commented 4 years ago

This is --version on godbolt (8.3 is the highest 8.x they have):

g++ (Compiler-Explorer-Build) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE

Seems that they have their own build of gcc. Maybe that's the difference... BTW I tested with gcc 7.5.0 as well, same results, works on Ubuntu, fails on godbolt.

martinmoene commented 4 years ago

On Windows: g++ 9.2.0, c++14:

error: '__builtin_strlen(((const char*)(& hello)))' is not a constant expression
mcskatkat commented 4 years ago

Ok, so the problem wasn't the builtins, it's what we've been passing into them. If you look more closely at the code you sent me and the code I compiled on my own, my char arrays are global while yours are local on the stack. I have a vague idea why this makes them non constexpr, but it's not really interesting in our context here. Just move the arrays out of main().

Also, the pattern of declaring a char array and then constructing a string view from it is not very common or practically useful. Mostly it would be string_view("hello").compare(string_view("world")), which tests ok because the char arrays are implicitly global constants.

martinmoene commented 4 years ago

Ok, so the problem wasn't the builtins, it's what we've been passing into them. ... Also, the pattern of declaring a char array and then constructing a string view from it is not very common or practically useful. Mostly it would be string_view("hello").compare(string_view("world")), which tests ok because the char arrays are implicitly global constants.

Yes :) Thanks a lot!

mcskatkat commented 4 years ago

Great. I will pull this and test when I get back to doing some work.

BTW, this compiles:

int main()
{
    constexpr char const * hello = "hello";
    constexpr char const * world = "world";
    static_assert( absl::string_view( hello ).compare( absl::string_view( world ) ) < 0, "" );
}

So it's really a very specific code form that triggered the problem. I think with local char x[] = "something" the compiler feels compelled to place an actual array on the stack and initialize the string upon entry. Not sure this is required by the language. I tried returning the compare() result, the code generated is terrible. It's probably a bad idea to have a locally initialized array on the stack.

martinmoene commented 4 years ago

I agree that my initial constexpr char hello[] = "hello"; should not prevent the application of __buitin_....

Will address that next.

mcskatkat commented 4 years ago

How are you going to address that? I think it's a compiler issue.

In my view, if gcc thinks that constexpr char hello[] = "hello", is not constexpr enough to be fed into constexpr __builtin...() and produce a constexpr, then it should have protested at the declaration, saying that it cannot apply constexpr to hello.

As a programmer, I am disappointed in gcc for quietly letting me declare something constexpr and then refusing to use it as such. I could be naively writing code assuming that hello is constexpr and would resolve comparisons at compile time, while in fact gcc would be silently producing run-time code to do that. It's not reasonable to expect me to static_assert every other line just to verify that things get resolved the way I intended them to be.

martinmoene commented 4 years ago

How are you going to address that? I think it's a compiler issue.

What I meant is that failing the particular edge case of a function-local constexpr char hello[] = "hello"; etc. as in my example, should not withhold us from using __buitin_...().

mcskatkat commented 4 years ago

As per Release 1.5.0, still not getting builtins with GCC due to precompiler symbol mixup: This is never defined, nor is it found anywhere else. Neither is this.

martinmoene commented 4 years ago

Well, that's embarrassing. I'll correct that now.

(How to prevent something like this happening?)

martinmoene commented 4 years ago

@mcskatkat Made another 'attempt' (#38)...

mcskatkat commented 4 years ago

Works for me with Release 1.5.1. From my perspective, I think the issue can be closed. I appreciate your diligent work on this 🙏

martinmoene commented 4 years ago

You're welcome, also thanks for your perseverance, help and patience :)