pboettch / json-schema-validator

JSON schema validator for JSON for Modern C++
Other
464 stars 133 forks source link

Very large or small unsigned integers #303

Closed mjcrawford22 closed 5 months ago

mjcrawford22 commented 5 months ago

Specifying a type as "integer" currently limits it to a 32 bit integer, i.e., limits of a 32 bit integer. Larger/smaller values convert the value to zero prior to checking the type. An integer (though not clarified in the standard that I could tell) is "[+-]?[0-9]+". Not a regex guru, but I think that is right. It should not be defined by the minimum or maximum limits of any given programming language or data type. In our instance we are trying to use a 64 bit unsigned integer and the validator fails a 13 digit unsigned integer.

There are clips from our input and validator files"

{ "SEED": 1234567890123 
}

 "properties": {

        "SEED": {
          "title": "SEED", "type": "integer",
 }}
pboettch commented 5 months ago

Your example works for me. On what platform are you working on?

lkersting commented 5 months ago

I believe it fails when you try to set a minimum of 0:

"properties": {

        "SEED": {
          "title": "SEED", "type": "integer", "minimum": 0
 }}
pboettch commented 5 months ago

No, still works for me. See here for the example I'm using: https://github.com/pboettch/json-schema-validator/tree/issue-303/test/issue-303

lkersting commented 5 months ago

Sorry, the issue isn't with 32 bit integers. The schema works with 64 bit signed integers, but fails for 64 bit UNSIGNED integers. So a value above 9223372036854775807 will be treated as a negative number:

{ "SEED": 9223372036854775808
}
"properties": {

       "SEED": {
         "title": "SEED", "type": "integer", "minimum": 0
}}

The following is giving me an error of below minimum of 0.

pboettch commented 5 months ago

Yeah, well, I guess, we have to live with that. The underlying JSON-library determines what int-types are used and not the schema-library.

When you std::cout your json-instance (without using the validator), what does it print?

When the json-instance is validated by the validator, the harm is already done, so nothing we can do here.

lkersting commented 5 months ago

A std::cout from the https://github.com/nlohmann/json handles an unsigned 64 bit int properly.

For:

{ "SEED": 9223372036854775808
}

the follwoing pseudo code:

auto seed = json_tree.get<unint64_t>()
std::cout << "seed = " << seed << std::endl;

returns 9223372036854775808 as expected. The Nlohmann json library cannot handle above uint64_t. Anything above 18446744073709551615, the serializer will turn into a double.

pboettch commented 5 months ago

And what does get<json::number_integer_t> produce? This is what this library uses to get the value of an "integer" or "number" instance.

Again I think there are limits here: when to allow doubles for integer-type in validation? And how to handle signed vs unsigned?

mjcrawford22 commented 5 months ago

This is a brief approximation of a test I created:

  json jTree, jTree2; 
  std::ifstream f("rng.json");
  jTree = json::parse(f);
  std::cout << jTree.dump() << std::endl;

  json::iterator jItFound = jTree.find("seed2");
  jTree2 = *jItFound;
  std::cout << jTree2.dump() << std::endl;
  std::cout << "get<uint64_t>(): " << jTree2.get<uint64_t>(); << std::endl;
  std::cout << "get<number_integer_t>(): " << jTree2.get<json::number_integer_t>() << std::endl;

rng.json:

{
  "seed2": 18446744073709551615
}

output:

get<uint64_t>(): 18446744073709551615
get<number_integer_t>(): -1

There exists "number_unsigned_t" in the Nlohmann JSON library. Is the best course of action to put in a feature request to add the Validator "integer" specification to handle a number_unsigned_t?

pboettch commented 5 months ago

So your idea is to validate against signed integer and unsigned? And if one of it is OK it is valid?

That doesn't work, because if someone wants a minimum of -10, and the instance has -15, with an unsigned int this would be valid.

mjcrawford22 commented 5 months ago

If in the validator I use:

...
   "properties": {
      "SEED": {
        "type": "integer",
        "minimum": 0, 
        "maximum": 18446744073709551615
     },
...

And the input is:

{
 "SEED": 0
}

This should pass and would imply comparing unsigned to unsigned. The maximum currently is converted to -1 since (I assume) the value is stored as a signed int. Our error handler says

Error at line: 2:
 "SEED": 0,
ERROR >>>>>> "SEED: 0" -- value exceeds maximum of -1

We convert "instance" to "value" for human readability.

pboettch commented 5 months ago

Which problem are actually trying solve? Is it really only the unsigned 64-bit int you're needing for validation? Why are 63-bit not enough?

What I want to say is, that even if we find a solution for using 64-bit uints, next you'll be asking for is is even bigger numbers.

Maybe your data should be stored in another format, a hex-string or and int-string, which could be validated with a regex. Which you then can converted to your correct type in your applicaiton.

mjcrawford22 commented 5 months ago

We use a Mersenne Twister 64 bit random number generator algorithm for our application. Going with a 32 bit period for the random number is problematic and insufficient. We frequently need to specify a starting seed which must be a 64-bit unsigned int. I will work with @lkersting on using a regex pattern for validation. Thanks so much for your help.

Yes, I imagine that soon, people will want at least a 128 bit integer, which is a limitation of the library we both use. FYI, your schema validation library has been incredibly valuable to us.

pboettch commented 5 months ago

We use a Mersenne Twister 64 bit random number generator algorithm for our application. Going with a 32 bit period for the random number is problematic and insufficient. We frequently need to specify a starting seed which must be a 64-bit unsigned int. I will work with @lkersting on using a regex pattern for validation. Thanks so much for your help.

Yes, I think it sane to wait for nlohmann::json to support int128_t as a possible int-base-type for integers.

FYI, your schema validation library has been incredibly valuable to us.

Thanks, nice to hear.