jhthorsen / json-validator

:cop: Validate data against a JSON schema
https://metacpan.org/release/JSON-Validator
56 stars 58 forks source link

Number Coercion fails for ".32" and "032" #143

Closed code4pay closed 5 years ago

code4pay commented 5 years ago

Steps to reproduce the behavior

validating Numbers ".32" or "032" fails with /proposal/multiplier: Expected number - got string. with Schema entry

     "multiplier" : {
       "type" : "number"
    },

and perl code

    $validator->coerce(
       booleans => 1,
       numbers  => 1,
       strings  => 1
   );
   @errors = $validator->schema($schema)->validate($args);   

Expected behavior

String comprising of numbers with a leading 0 should be coerced to numbers Strings starting with "." and consecutive numbers should be coerced to numbers.

I believe it is from the regex check to see if the string is indeed a type of number.

code4pay commented 5 years ago

maybe using this would be a better idea ? https://metacpan.org/pod/Scalar::Util#looks_like_number

jhthorsen commented 5 years ago

I don't agree. Can you explain why ".31" and "031" are valid numbers?

code4pay commented 5 years ago

Hi @jhthorsen thanks for the module. I understand that they are odd looking numbers, but the fact that I am expecting a numerical value , I expect it to be coerced (because I set the option) and Perl happily treats them as numbers makes it valid that they be considered numbers?

jhthorsen commented 5 years ago

The problem is that this is for coercing strings into numbers/booleans and the other way around. I don't really want to support things that are not numbers across platforms, like in actual JSON.

Especially 031 is normally 25 and not 31. I could be inclined to support .31, but I don't see a real world use case where that is normal input. I'm closing this, but feel free to comment with real world use cases and/or get other people on board to get this implemented.

Pilat66 commented 5 years ago

031 as 31 is written when it is important to keep the number of characters. For example, this is a recording of the geographical latitude / longitude in the NMEA protocol and in some other cases.

These are examples from complex protocols, not sure that they are sufficient to justify leaving 031 as 31 decimal. But ordinary people can easily write 031, even without knowing the rules for writing octal numbers in C.

NMEA GPGGA sentence: $ GPGGA, 092750.000.0321.6802, N, 00630.33, W, 1,8,1.03,61.7, M, 55.2, M ,, * 76

00630 is 6 degrees 30 minutes. The example is not entirely correct, since 00630 is not a number but a string. But in the record in the format used in LRIT, the same will look like 006.30.33.W and here the dot separates the numbers, not the strings.

For example, here is a link to a description of the format of geographic coordinates in ArcGIS: http://desktop.arcgis.com/en/arcmap/10.3/tools/data-management-toolbox/supported-notation-formats.htm

All of the following examples examples are equivalent:

27 54.00N 087 59.00W27 54.00n 087 59.00w27 54N 087 0W + 27 54.00 087 59.00WN27 54.00 W087 59.0027 54.00N / 87 59W27 ° 54.00’N 087 ° 59.00’W

There are already examples of recording as numbers, for example, 087.

jhthorsen commented 5 years ago

Thanks for all the examples, but I think they are way too specific to be supported as numbers. If you want your API to support those, then I would add a specific format or regex rule for a string and then convert it manually in the controller/action.

code4pay commented 5 years ago

@jhthorsen, Because unfortunately we will need to have this type of coercion for our project and we would prefer not to have to maintain a separate fork, what do you think about either having an option to coercion like numbers => 2 that allows these type of inputs to be coerced, or an option to new that allows us to pass a custom regex to that function? I would be happy to create a PR for either.

jhthorsen commented 5 years ago

What I don't understand is why you don't define it as a string with a regex/format and then convert the values after validation.

Adding numbers=>2 won't fly, since then I open up a wormhole of possibilities for numbers=>3, where the string "four" get coerced into 4, or any other clever thing people might come up with.

Another thing I don't know is where do you store these "numbers"? Do you have a database that accepts ".32" and "032" as numbers and store them as "0.32" and "32"?

If JSON.parse('{"num": .3}') was accepted by my browser, then I would consider adding your request, but... It's not.

Instead of just saying what I don't want to do, I can share what I would do. Instead of saying type "number", I would use the definition below and then post-process those occurrences after validated:

{"type": "string", "pattern": "^\.?\d+$"}

@Pilat66: You started your comment by using the word characters. This again tells me that coercing that into integers are wrong and that my proposed solution is more correct, since a "string" indeed consists of characters.

tyldum commented 5 years ago

@code4pay : I understand where you are coming from, but in this case I agree with the maintainer. Not only is the "060"-case very ambiguous, but this also leads down a rabbit hole of many other formats.

I often work with stuff where I have to transport other protocols or formats in JSON, and generally you transfer the whole structure as a string or parse it down to match JSON natively so 00630 becomes { "degrees": 6, "minutes": 30 }. The design seems tightly coupled to the format and I would argue that you should avoid it if possible.

Are you after "coercion" or validation?