Closed soerengrunewald closed 1 month ago
I believe we did model this after, or to mimic, some newer C++ library function as kind of backport.
This seems to be not documented, but I guess it was std::from_chars()
. At least it shares a lot common features, like being able to work with non-terminated strings, explicit error messaging, and so on.
The Doxygen comment mentions this
* Convert an ASCII string_view into a number.
* This means to_number() accepts a subset of C++17's from_chars() input format; where it
* supports the input format it is modeled close to from_chars().
The from_chars()
(and also to_chars()
) do not support bool
; and using from_chars()
to fill a bool
variable generated the same unreadable error output as does to_number<bool>()
;-D
The usecase, a D_array<bool>
, is that even possible?
And then, it seems it uses get_value_from_string<>()
which does a) not support bool
and b) does not use to_number()
?
I'm confused. Is there a PR for the server library?
I believe we did model this after, or to mimic, some newer C++ library function as kind of backport.
This seems to be not documented, but I guess it was
std::from_chars()
. At least it shares a lot common features, like being able to work with non-terminated strings, explicit error messaging, and so on.The Doxygen comment mentions this
* Convert an ASCII string_view into a number.
* This means to_number() accepts a subset of C++17's from_chars() input format; where it * supports the input format it is modeled close to from_chars().
The
from_chars()
(and alsoto_chars()
) do not supportbool
; and usingfrom_chars()
to fill abool
variable generated the same unreadable error output as doesto_number<bool>()
;-D
Why should the following not be allowed?:
#include <charconv>
#include <limits>
#include <string_view>
#include <type_traits>
int main()
{
std::string_view const str = "1";
bool res{};
static_assert(std::numeric_limits<bool>::is_integer);
static_assert(std::is_integral_v<bool> == true);
// std::from_chars_result
// from_chars( const char* first, const char* last,
// /* integer-type */& value, int base = 10 );
auto r = std::from_chars(str.data(), str.data() + str.size(), res, 2 /*base*/);
return 0;
}
The usecase, a
D_array<bool>
, is that even possible? And then, it seems it usesget_value_from_string<>()
which does a) not supportbool
Therefor we have clientlib PR 200
and b) does not use
to_number()
?I'm confused. Is there a PR for the server library?
See serverlib PR 223
@Finii, @soerengrunewald: This PR has been idle since July, but I think it is worth pursuing. Any more comments/commits?
I reopened this PR in the repo and just looked at to_number.h
, and immediately this looks not ready:
The documentation does not mention bool
at all.
Further a still believe, albeit you both think this is a 'nice addon', it is even a misnomer, because to_number<bool>
does NOT convert anything to a number (this is not C where the 'bool' is an integer with value 0 or not-0).
The only driving force here is convenience for some template work in the server library. With almost the same reasoning you could also incorporate to_number<string>()
or to_number<complex<double>>()
("I want to conveniently read some string into a value").
We should not dilute to_number()
with featureism.
With other words, I am SA this addition.
Further a still believe, albeit you both think this is a 'nice addon', it is even a misnomer, because
to_number<bool>
does NOT convert anything to a number (this is not C where the 'bool' is an integer with value 0 or not-0).The only driving force here is convenience for some template work in the server library. With almost the same reasoning you could also incorporate
to_number<string>()
orto_number<complex<double>>()
("I want to conveniently read some string into a value").We should not dilute
to_number()
with featureism.With other words, I am SA this addition.
I do not think it is a misnomer. In C++, is_integral<bool>
is true
, so in some way bool
is the shortest available integer. Doubles have string-y representations like "nan" and "inf", too. That does not strike me as a bad mismatch with 0/1/true/false.
Also, I agree with @soerengrunewald that this extension would be useful: We have just hand-coded the exact same thing twice in taskolib&friends.
That said, I'll give this a WF.
Hmm, why is 0
and 1
handled at all (and not 000
), the stdlib does not:
_Code inspired by https://github.com/taskolib/taskolib/blob/main/src/deserialize_sequence.h_
Edit:
For me "true"
and "false"
is the representation of a bool
, but 1
and 0
are representations of the integers 1 and 0. What is done here is to automagically convert the integer values to bool (in a C fashion), but with omitting the expectation that "2"
is also true
-ish.
Regarding the one and zero...
Example
auto x = static_cast<bool>(2); // works
auto x = *to_number("2"); // does not
The feeling is strange, I would either omit numbers as valid inputs (bools are in fact neither 1 nor 0); or handle it in the usual way where just zero is false and everything else (which is a number?) is true.
So what is missing to go forward?
So what is missing to go forward?
Well, I think the performance aspect (If we throw a long string at the function, it is converted to lowercase twice before we realize that we cannot do anything with it
) should be addressed. And according to Fini we should maybe support representations like 000
and 0001
? :shrug:
How about:
if (str.length() == 1) // constexpr
{
if (str.front() == '0')
return false;
if (str.front() == '1')
return true;
}
else
{
if (gul14::equals_nocase(str, "true")) // constexpr
return true;
if (gul14::equals_nocase(str, "false")) // constexpr
return false;
auto const number = to_number<int>(str); // constexpr
if (number.has_value())
{
if (number.value() == 0)
return false;
if (number.value() == 1)
return true;
}
}
How about:
That does look nice.
Still the question for me is
"2"
not also True?"2"
is not a valid input, why is "1"
valid?I believe (if we introduce this), we should either stick to the real 'bool' representations "True"/"False"; or if we simulate the C integer-to-bool understanding that 0 is false and everything else is true, well, we should in fact do THAT and not limit it to 1.
What is the reason for this "only 1 is true" idea that you both seem to share?
auto const number = to_number
(str); // constexpr
The comment "constexpr" is ... why?
Whatever, of course, we could / should in fact roll a special version to check if
'0'
characters (False)"0123456789"
(True)I guess that is even faster than a full blown to_number<int>()
.
The DOOCS Serverlib uses to_number when resorting values in D_array
How does the library serialize a D_array<bool>
? I guess internally it is not a vector-of-bool but a vector of integers, and so it is integers in the save-file? Maybe that is the reason why you solely concentrate on "1"
?
auto const number = to_number(str); // constexpr
The comment "constexpr" is ... why?
To indicate that these calls are constexpr capable.
Whatever, of course, we could / should in fact roll a special version to check if
* The input string contains only `'0'` characters (False) * The input string contains only digits `"0123456789"` (True) * The input is invalid
I'm open to any suggestions.
I guess that is even faster than a full blown
to_number<int>()
.
Is performance really an issue?
I guess that is even faster than a full blown
to_number<int>()
.Is performance really an issue?
I believe no, but
Well, I think the performance aspect
And I just wanted to point out that the 'real C' sense of boolean number interpretations is even faster than the "correct" one that allows only one as True.
I'm open to any suggestions.
And still you do not comment on why "2" is invalid and/or if that would be good or bad or whatever ;)
How about:
That does look nice.
Still the question for me is
* Is `"2"` not also _**True**_? * If `"2"` is not a valid input, why is `"1"` valid?
I believe (if we introduce this), we should either stick to the real 'bool' representations "True"/"False"; or if we simulate the C integer-to-bool understanding that 0 is false and everything else is true, well, we should in fact do THAT and not limit it to 1.
What is the reason for this "only 1 is true" idea that you both seem to share?
The main reason is simplicity (or the tighter the boundaries the simpler the code). Anyhow how about this:
if (str.length() == 1 and isdigit(str.front()))
{
if (str.front() == '0')
return false;
return true;
}
if (gul14::equals_nocase(str, "true")) // constexpr
return true;
if (gul14::equals_nocase(str, "false")) // constexpr
return false;
auto const number = to_number<int>(str); // constexpr
if (number.has_value())
{
if (number.value() == 0)
return false;
return true;
}
return {};
gul14::optional<bool> to_bool(gul14::string_view input) {
bool ret{ };
size_t i{ };
for (; i < input.length(); ++i) {
if (not isdigit(input[i]))
break;
if (input[i] != '0')
ret = true;
}
if (i == input.length())
return ret;
if (gul14::equals_nocase(input, "true")) // constexpr
return true;
if (gul14::equals_nocase(input, "false")) // constexpr
return false;
return gul14::nullopt;
}
TEST_CASE("to_number(): bool", "[to_number]")
{
auto a = to_bool("1");
REQUIRE(*a == true);
a = to_bool("002");
REQUIRE(*a == true);
a = to_bool("000");
REQUIRE(*a == false);
a = to_bool("falSe");
REQUIRE(*a == false);
a = to_bool("TruE");
REQUIRE(*a == true);
a = to_bool("TruhE");
REQUIRE(a.has_value() == false);
a = to_bool("00012345h");
REQUIRE(a.has_value() == false);
}
Example code but no template
I believe the special case with length == 1
is not much better than that, and it is faster in other numbers; not that I believe that performance matters much here.
Codegen is nice:
Edit: Remove debug code :-D Edit: Improve loop Edit: Add codegen
Do we wanna allow 0x0001
or 0b0001
as well?
Do we wanna allow
0x0001
or0b0001
as well?
REQUIRE( to_number<int>("0x42").value() == 42 )
due to unexpected exception with message:
bad optional access
_to_number()
can only work with decimal (base 10)_
Edit: Change example
I believe (if we introduce this), we should either stick to the real 'bool' representations "True"/"False"; or if we simulate the C integer-to-bool understanding that 0 is false and everything else is true, well, we should in fact do THAT and not limit it to 1.
What is the reason for this "only 1 is true" idea that you both seem to share?
IMO it is sufficient to think of this as the counterpart of some reasonable serializer. For a serializer, reasonable output for a boolean would be
true
(+case variations)false
(+case variations)0
1
.
Everything else like 001
or 2
strikes me as an oddity. Granted, one could argue for the 001
case because all other unsigned integers allow leading zeros as well. But 2
is strictly outside of the conversion domain (i.e. of the integer range). We also do not allow to_number<unsigned short>(200'000)
. The same goes for -1
, which is also outside of the range.I must admit Lars' last comment (well now not the last comment anymore, but you know probably what I mean) convinced me, and I now would say just true
false
1
and 0
is ok ;-)
Sorry, but I think it has deserved some discussion.
Damm I have to leave now...but str == "0|1"
is not constexpr :'(
Fixed constexpr problem.
I believe there is some docu missing still, will add that if needed, lets see
Why
The DOOCS Serverlib uses
to_number
when resorting values inD_array
. But it breaks if we wanna instantiate an array of booleans.This commit will add basic support to allow
to_number
been used withbool
.How
First make sure the other implementations wont trigger, since
bool
is arithmetic, integral and unsigned. Then add a very simple implementation which allows only0|1
andtrue|false
. And to round things up, add some very basic tests.