gul-cpp / gul14

General Utility Library for C++14
https://gul14.info/
GNU Lesser General Public License v2.1
2 stars 1 forks source link

to_number: Allow basic usage of bool #92

Closed soerengrunewald closed 1 month ago

soerengrunewald commented 4 months ago

Why
The DOOCS Serverlib uses to_number when resorting values in D_array. But it breaks if we wanna instantiate an array of booleans.

This commit will add basic support to allow to_number been used with bool.

How
First make sure the other implementations wont trigger, since bool is arithmetic, integral and unsigned. Then add a very simple implementation which allows only 0|1 and true|false. And to round things up, add some very basic tests.

Finii commented 4 months ago

I believe we did model this after, or to mimic, some newer C++ library function as kind of backport.

This seems to be not documented, but I guess it was std::from_chars(). At least it shares a lot common features, like being able to work with non-terminated strings, explicit error messaging, and so on.

The Doxygen comment mentions this

 * Convert an ASCII string_view into a number.
 * This means to_number() accepts a subset of C++17's from_chars() input format; where it
 * supports the input format it is modeled close to from_chars().

The from_chars() (and also to_chars()) do not support bool; and using from_chars() to fill a bool variable generated the same unreadable error output as does to_number<bool>() ;-D


The usecase, a D_array<bool>, is that even possible? And then, it seems it uses get_value_from_string<>() which does a) not support bool and b) does not use to_number()?

I'm confused. Is there a PR for the server library?

soerengrunewald commented 4 months ago

I believe we did model this after, or to mimic, some newer C++ library function as kind of backport.

This seems to be not documented, but I guess it was std::from_chars(). At least it shares a lot common features, like being able to work with non-terminated strings, explicit error messaging, and so on.

The Doxygen comment mentions this

 * Convert an ASCII string_view into a number.
 * This means to_number() accepts a subset of C++17's from_chars() input format; where it
 * supports the input format it is modeled close to from_chars().

The from_chars() (and also to_chars()) do not support bool; and using from_chars() to fill a bool variable generated the same unreadable error output as does to_number<bool>() ;-D

Why should the following not be allowed?:

#include <charconv>
#include <limits>
#include <string_view>
#include <type_traits>

int main()
{
    std::string_view const str = "1";
    bool res{};

    static_assert(std::numeric_limits<bool>::is_integer);
    static_assert(std::is_integral_v<bool> == true);

    // std::from_chars_result
    //     from_chars( const char* first, const char* last,
    //                 /* integer-type */& value, int base = 10 );
    auto r = std::from_chars(str.data(), str.data() + str.size(), res, 2 /*base*/);

    return 0;
}

The usecase, a D_array<bool>, is that even possible? And then, it seems it uses get_value_from_string<>() which does a) not support bool

Therefor we have clientlib PR 200

and b) does not use to_number()?

I'm confused. Is there a PR for the server library?

See serverlib PR 223

alt-graph commented 1 month ago

@Finii, @soerengrunewald: This PR has been idle since July, but I think it is worth pursuing. Any more comments/commits?

Finii commented 1 month ago

I reopened this PR in the repo and just looked at to_number.h, and immediately this looks not ready: The documentation does not mention bool at all.

image

Further a still believe, albeit you both think this is a 'nice addon', it is even a misnomer, because to_number<bool> does NOT convert anything to a number (this is not C where the 'bool' is an integer with value 0 or not-0).

The only driving force here is convenience for some template work in the server library. With almost the same reasoning you could also incorporate to_number<string>() or to_number<complex<double>>() ("I want to conveniently read some string into a value").

We should not dilute to_number() with featureism.

With other words, I am SA this addition.

alt-graph commented 1 month ago

Further a still believe, albeit you both think this is a 'nice addon', it is even a misnomer, because to_number<bool> does NOT convert anything to a number (this is not C where the 'bool' is an integer with value 0 or not-0).

The only driving force here is convenience for some template work in the server library. With almost the same reasoning you could also incorporate to_number<string>() or to_number<complex<double>>() ("I want to conveniently read some string into a value").

We should not dilute to_number() with featureism.

With other words, I am SA this addition.

I do not think it is a misnomer. In C++, is_integral<bool> is true, so in some way bool is the shortest available integer. Doubles have string-y representations like "nan" and "inf", too. That does not strike me as a bad mismatch with 0/1/true/false.

Also, I agree with @soerengrunewald that this extension would be useful: We have just hand-coded the exact same thing twice in taskolib&friends.

That said, I'll give this a WF.

Finii commented 1 month ago

Hmm, why is 0 and 1 handled at all (and not 000), the stdlib does not:

Screenshot 2024-10-01 at 10 16 28

_Code inspired by https://github.com/taskolib/taskolib/blob/main/src/deserialize_sequence.h_

Edit:

For me "true" and "false" is the representation of a bool, but 1 and 0 are representations of the integers 1 and 0. What is done here is to automagically convert the integer values to bool (in a C fashion), but with omitting the expectation that "2" is also true-ish.

Finii commented 1 month ago

Regarding the one and zero...

Example

auto x = static_cast<bool>(2); // works
auto x = *to_number("2"); // does not

The feeling is strange, I would either omit numbers as valid inputs (bools are in fact neither 1 nor 0); or handle it in the usual way where just zero is false and everything else (which is a number?) is true.

soerengrunewald commented 1 month ago

So what is missing to go forward?

alt-graph commented 1 month ago

So what is missing to go forward?

Well, I think the performance aspect (If we throw a long string at the function, it is converted to lowercase twice before we realize that we cannot do anything with it) should be addressed. And according to Fini we should maybe support representations like 000 and 0001? :shrug:

soerengrunewald commented 1 month ago

How about:

    if (str.length() == 1) // constexpr
    {
        if (str.front() == '0')
            return false;
        if (str.front() == '1')
            return true;
    }
    else
    {
        if (gul14::equals_nocase(str, "true"))  // constexpr
            return true;

        if (gul14::equals_nocase(str, "false")) // constexpr
            return false;

        auto const number = to_number<int>(str); // constexpr
        if (number.has_value())
        {
            if (number.value() == 0)
                return false;
            if (number.value() == 1)
                return true;
        }
    }
Finii commented 1 month ago

How about:

That does look nice.

Still the question for me is

I believe (if we introduce this), we should either stick to the real 'bool' representations "True"/"False"; or if we simulate the C integer-to-bool understanding that 0 is false and everything else is true, well, we should in fact do THAT and not limit it to 1.

What is the reason for this "only 1 is true" idea that you both seem to share?

Finii commented 1 month ago

auto const number = to_number(str); // constexpr

The comment "constexpr" is ... why?

Whatever, of course, we could / should in fact roll a special version to check if

I guess that is even faster than a full blown to_number<int>().

Finii commented 1 month ago

The DOOCS Serverlib uses to_number when resorting values in D_array

How does the library serialize a D_array<bool> ? I guess internally it is not a vector-of-bool but a vector of integers, and so it is integers in the save-file? Maybe that is the reason why you solely concentrate on "1"?

soerengrunewald commented 1 month ago

auto const number = to_number(str); // constexpr

The comment "constexpr" is ... why?

To indicate that these calls are constexpr capable.

Whatever, of course, we could / should in fact roll a special version to check if

* The input string contains only `'0'` characters (False)

* The input string contains only digits `"0123456789"` (True)

* The input is invalid

I'm open to any suggestions.

I guess that is even faster than a full blown to_number<int>().

Is performance really an issue?

Finii commented 1 month ago

I guess that is even faster than a full blown to_number<int>().

Is performance really an issue?

I believe no, but

Well, I think the performance aspect

And I just wanted to point out that the 'real C' sense of boolean number interpretations is even faster than the "correct" one that allows only one as True.

I'm open to any suggestions.

And still you do not comment on why "2" is invalid and/or if that would be good or bad or whatever ;)

soerengrunewald commented 1 month ago

How about:

That does look nice.

Still the question for me is

* Is `"2"` not also _**True**_?

* If `"2"` is not a valid input, why is `"1"` valid?

I believe (if we introduce this), we should either stick to the real 'bool' representations "True"/"False"; or if we simulate the C integer-to-bool understanding that 0 is false and everything else is true, well, we should in fact do THAT and not limit it to 1.

What is the reason for this "only 1 is true" idea that you both seem to share?

The main reason is simplicity (or the tighter the boundaries the simpler the code). Anyhow how about this:

    if (str.length() == 1 and isdigit(str.front()))
    {
        if (str.front() == '0')
            return false;
        return true;
    }

    if (gul14::equals_nocase(str, "true"))  // constexpr
        return true;
    if (gul14::equals_nocase(str, "false")) // constexpr
        return false;

    auto const number = to_number<int>(str); // constexpr
    if (number.has_value())
    {
        if (number.value() == 0)
            return false;
        return true;
    }

    return {};
Finii commented 1 month ago
gul14::optional<bool> to_bool(gul14::string_view input) {
    bool ret{ };
    size_t i{ };
    for (; i < input.length(); ++i) {
        if (not isdigit(input[i]))
            break;
        if (input[i] != '0')
            ret = true;
    }
    if (i == input.length())
        return ret;
    if (gul14::equals_nocase(input, "true"))  // constexpr
        return true;
    if (gul14::equals_nocase(input, "false")) // constexpr
        return false;
    return gul14::nullopt;
}

TEST_CASE("to_number(): bool", "[to_number]")
{
    auto a = to_bool("1");
    REQUIRE(*a == true);
    a = to_bool("002");
    REQUIRE(*a == true);
    a = to_bool("000");
    REQUIRE(*a == false);
    a = to_bool("falSe");
    REQUIRE(*a == false);
    a = to_bool("TruE");
    REQUIRE(*a == true);
    a = to_bool("TruhE");
    REQUIRE(a.has_value() == false);
    a = to_bool("00012345h");
    REQUIRE(a.has_value() == false);
}

Example code but no template

I believe the special case with length == 1 is not much better than that, and it is faster in other numbers; not that I believe that performance matters much here.

Codegen is nice:

image

Edit: Remove debug code :-D Edit: Improve loop Edit: Add codegen

soerengrunewald commented 1 month ago

Do we wanna allow 0x0001 or 0b0001 as well?

Finii commented 1 month ago

Do we wanna allow 0x0001 or 0b0001 as well?

  REQUIRE( to_number<int>("0x42").value() == 42 )
due to unexpected exception with message:
  bad optional access

_to_number() can only work with decimal (base 10)_

Edit: Change example

alt-graph commented 1 month ago

I believe (if we introduce this), we should either stick to the real 'bool' representations "True"/"False"; or if we simulate the C integer-to-bool understanding that 0 is false and everything else is true, well, we should in fact do THAT and not limit it to 1.

What is the reason for this "only 1 is true" idea that you both seem to share?

IMO it is sufficient to think of this as the counterpart of some reasonable serializer. For a serializer, reasonable output for a boolean would be

Finii commented 1 month ago

I must admit Lars' last comment (well now not the last comment anymore, but you know probably what I mean) convinced me, and I now would say just true false 1 and 0 is ok ;-) Sorry, but I think it has deserved some discussion.

soerengrunewald commented 1 month ago

Damm I have to leave now...but str == "0|1" is not constexpr :'(

Finii commented 1 month ago

Fixed constexpr problem.

I believe there is some docu missing still, will add that if needed, lets see