Parser and constructor resolve integer types differently

andkerr commented 7 months ago

Description

A JSON value constructed from a non-negative C++ int has a different type than one constructed by parsing the string encoding of the same int. Specifically, JSON values initialized directly are assigned the json::value_t::number_integer type, and those initialized by parsing are assigned the json::value_t::number_unsigned type.

I found this GitHub discussion which mentions something similar. I'm happy to continue the discussion there if that's convenient, it looks like it's still unresolved.

Reproduction steps

In client code,

construct a JSON value from a non-negative int (e.g. json(1))
construct a JSON value by parsing the same integer encoded in a string (e.g. json::parse("1"))
compare the values returned by the .type() member of each JSON

The code snippet below gives an example of this approach.

Expected vs. actual results

I would expect that JSON values constructed from roughly "equivalent" (I know that's a bit of a tricky word) representations would have equal value types.

Fortunately, the other type inspection functions are still consistent in spite of this difference: is_number() and is_number_integer() return true in both cases, is_number_unsigned() returns true only for the "parsed" JSON, whose internal type is unsigned.

Minimal code example

#include <iostream>
#include <nlohmann/json.hpp>

using json = nlohmann::json;

void query_types(const json& j) {
    std::cout << "j.dump(): " << j.dump() << '\n';

    std::cout << std::boolalpha
              << "j.is_number(): "
              << (j.is_number()) << '\n'
              << "j.type() == json::value_t::number_integer: "
              << (j.type() == json::value_t::number_integer) << '\n'
              << "j.is_number_integer(): "
              << (j.is_number_integer()) << '\n'
              << "j.type() == json::value_t::number_unsigned: "
              << (j.type() == json::value_t::number_unsigned) << '\n'
              << "j.is_number_unsigned(): "
              << (j.is_number_unsigned()) << '\n'
              << "j.type() == json::value_t::number_float: "
              << (j.type() == json::value_t::number_float) << '\n'
              << "j.is_number_float(): "
              << (j.is_number_float()) << '\n';
}

int main() {
    auto j1 = json(1);
    query_types(j1);

    std::cout << '\n';

    auto j2 = json::parse("1");
    query_types(j2);
}

Error messages

None

Compiler and operating system

Tested with g++ 7.5.0 on Ubuntu 18.04 LTS (bionic) and g++11.4.0 on Ubuntu 22.04 LTS (jammy)

Library version

Tested with v3.7.3, v.3.11.2, and the latest develop (6eab7a2b187b10b2494e39c1961750bfd1bda500)

Validation

[X] The bug also occurs if the latest version from the develop branch is used.
[X] I can successfully compile and run the unit tests.

gregmarr commented 7 months ago

Is there actually an issue here, or just a difference that you've noticed? Constructing from a known type uses the signedness of that type. Parsing a number from text is unsigned unless it has a sign. What change are you asking for?

andkerr commented 7 months ago

Interesting, I didn't know that's the expected behaviour when numbers are parsed from text. I think my own ignorance on the subject is the issue here then. If the difference is simply due to C++ and JSON having different concepts of signedness I'm not sure anything should change.

I'll close this issue, thank you for the quick reply to clarify this.

gregmarr commented 7 months ago

JSON itself doesn't really have a concept of signedness, or even integer vs floating point, it is just numbers.

This library separates numbers into three categories, signed, unsigned, or floating point. This is because it has to store values in concrete data types that have defined ranges. Using three representations with partially overlapping ranges, it can store a larger range of values than if it only used one storage type. It can support the full range of integers from minimum signed value to maximum unsigned value, which is 50% more than the range of either signed or unsigned alone, and also the full range of floating point values supported by double.

This allows it to parse a number and determine if it should store it as an unsigned integer value between 0 and unsigned max, or as signed integer value between signed min and 0, or as a floating point value. It can then write that value back out again as the same number (within the limits of floating point number representations).

nlohmann / json