snitch-org / snitch

Lightweight C++20 testing framework.
Boost Software License 1.0
257 stars 7 forks source link

Failure on nullptr char* #107

Closed vid512 closed 1 year ago

vid512 commented 1 year ago

CHECK or REQUIRE macros fail on nullptr char or const char value.

#define SNITCH_IMPLEMENTATION
#include "snitch_all.hpp"

TEST_CASE("nullptr", "") {
    CHECK( (const char*)"abc" == (const char*)nullptr );
}

Compilation fails because snitch is trying to convert nullptr into string_view:

template<typename T>
[[nodiscard]] constexpr bool append(small_string_span ss, T* ptr) noexcept {
    if constexpr (std::is_same_v<std::remove_cv_t<T>, char>) {
        return append(ss, std::string_view(ptr));                             <------- 
    } else if constexpr (std::is_function_v<T>) {
        if (ptr != nullptr) {
            constexpr std::string_view function_ptr_str = "0x????????";
            return append(ss, function_ptr_str);
        } else {
            return append(ss, nullptr);
        }
    } else {
        return append(ss, static_cast<const void*>(ptr));
    }
}

It appears snitch assumes every char points to string. Comparison of char against nullptr is common enough, especially with legacy code / libraries.

cschreib commented 1 year ago

Hi and thank you for reporting this issue. Good spot! My initial thought on seeing this was that I should move the check for nullptr to the top level, so that all null pointers are treated the same way. The downside is you can't tell the difference between a null const char* and a const char* pointing to "nullptr".

Perhaps it would be best to not consider const char* as a string at all, and let users cast it to std::string_view if they want to treat it as such?

vid512 commented 1 year ago

Regarding the nullptr ambiguity, my first intuition would be that string should be serialized including quotation marks. So for example, CHECK((char*)"xyz" == (char*)"xyz != xyz"); wouldn't be displayed as xyz != xyz != xyz, but rather as "xyz" != "xyz != xyz". That would also solve the nullptr ambiguity.

Of course, then there will be question of displaying string containing the " character, etc. That's would just extend a problem which already exists with \n or \t characters, which are not escaped as well. Not a big issue for me. Enclosing serialized strings in quotation marks is the more clear option, in my opinion.

Not treating char as string by default at first it hit me like something users would not expect. But after giving it some thought, it does seem to make sense. In practice, pointer to type doesn't necessarily have to point to anything meaningful, it can be used for arithmetics only. For example a pointer which functions like end() iterator of containers. I've seen char used as "byte pointer" in some lazy code. So this might be the most correct approach technically.

Some users might find it hard having to implement append() themselves. It is a bit complicated for newcomers, who only want to consume the library in simplest way possible. If you go this route, there should be at least a paragraph on this issue in docs, or an example how to implement append() for char / const char.

However, I can imagine it would be confusing why "text" isn't treated as a string by the library. Especially in some very common cases, like comparison std::string against const char*. If I understand it correctly, would it mean that following code will not serialize correctly?

  std::string s = read_a_string();
  CHECK( s == "abc");

That alone might be reason enough not to go this way.

cschreib commented 1 year ago

I think we can support the latter with append(const char[N]), since a string litteral isn't const char* from the start. But it may have decayed to const char* by the time it reaches append(), I'll have to test that.

Thinking about this some more, it is actually dangerous to treat all const char* as null-terminated strings by default, because indeed people can use a buffer of char to store arbitrary binary data. There may not be a null terminator at all. Seems to me that, although annoying, this has to be opt-in.

cschreib commented 1 year ago

I think we can support the latter with append(const char[N]), since a string litteral isn't const char* from the start. But it may have decayed to const char* by the time it reaches append(), I'll have to test that.

The short answer: it depends, so I'm not sure we can rely on that.

The long answer now. With the right overloads of append(), we can indeed get append(s, "abc") to work without defining an overload to const char*. The reason is that C++ delays array-to-pointer decay if the object is converted to a reference. So, defining the following type alias and concept:

template<std::size_t N>
using char_array = char[N];

template<typename T>
struct is_raw_string : std::false_type {};
template<std::size_t N>
struct is_raw_string<char_array<N>> : std::true_type {}; // needed for matching to `const T&`
template<std::size_t N>
struct is_raw_string<const char_array<N>&> : std::true_type {}; // needing for matching to `T&&`

template<typename T>
concept raw_string = is_raw_string<T>::value;

we can support append(s, "abc") with:

template<raw_string T>
[[nodiscard]] constexpr bool append(small_string_span ss, const T& value) noexcept {
    return append(ss, std::string_view(value));
}

There is some complication however, because the minute we add the overload for pointers:

template<typename T>
concept pointer = std::is_pointer_v<T>;

template<pointer T>
[[nodiscard]] constexpr bool append(small_string_span ss, T ptr) noexcept {
    if (value == nullptr) {
        return append(ss, nullptr);
    }

    if constexpr (std::is_function_v<T>) {
        constexpr std::string_view function_ptr_str = "0x????????";
        return append(ss, function_ptr_str);
    } else {
        return append(ss, static_cast<const void*>(value));
    }
}

... the call to append(s, "abc") becomes ambiguous :( I guess that is part of the C++ overload resolution scheme, to try decaying parameters and see if that also gives a match. So we have to merge them into a single function to delay the decay even further, and resolve the ambiguity ourselves:

template<std::size_t N>
using char_array = char[N];

template<typename T>
struct is_raw_string : std::false_type {};
template<std::size_t N>
struct is_raw_string<char_array<N>> : std::true_type {}; // needed for matching to `const T&`
template<std::size_t N>
struct is_raw_string<const char_array<N>&> : std::true_type {}; // needing for matching to `T&&`

template<typename T>
concept raw_string = is_raw_string<T>::value;

template<typename T>
concept pointer = std::is_pointer_v<T>;

template<typename T>
struct is_function_pointer : std::false_type {};
template<typename T>
struct is_function_pointer<T*> : std::is_function<T> {};

template<typename T>
constexpr bool is_function_pointer_v = is_function_pointer<T>::value;

template<typename T>
    requires(raw_string<T> || pointer<T>)
[[nodiscard]] constexpr bool append(small_string_span ss, T&& value) noexcept {
    if constexpr (std::is_pointer_v<T>) {
        if (value == nullptr) {
            return append(ss, nullptr);
        }

        if constexpr (is_function_pointer_v<T>) {
            constexpr std::string_view function_ptr_str = "0x????????";
            return append(ss, function_ptr_str);
        } else {
            return append(ss, static_cast<const void*>(value));
        }
    } else {
        return append(ss, std::string_view(value));
    }
}

Oh my... But now append(s, "abc") appends the characters "abc", and append(s, (const char*)"abc") appends the address of the pointer. Yay!

But unfortunately, C++ decay rules are annoying. We can't stop all the decays, so append(s, pick_left ? "left" : "right") will decay the "left" and "right" literals to const char* before we even get to append(), and we can't do anything about it. This would end up printing the addresses, which is counter-intuitive, unless the user explicitly asks for std::string_view using append(s, pick_left ? "left"sv : "right"sv).

I'll think about this some more, but my current preference is to keep treating const char* as a string-like type, with a check for nullptr.