json-patch / json-patch-tests

Tests for implementations of json-patch
68 stars 21 forks source link

Input validation #5

Closed mnot closed 9 years ago

mnot commented 11 years ago

From the apps-discuss mailing list:

---8<---

It seems no one does input validation. Argh! I glanced at over half of the 8 implementations listed at http://trac.tools.ietf.org/wg/appsawg/trac/wiki/JsonPatch and didn't find a single one that checked that a JSON pointer segment was a valid array index before converting it to an integer.

Given that not one implementation validated the input seems like a strong reason to call more attention to it in the spec. For instance, stating "/+7", "/007", "/ 7", and "/7.0" are example of pointers that MUST NOT be accepted as array indices. Or perhaps provide a regex that a segment MUST match to be an array index: 0|[1-9][0-9]*.

All the implementations I glanced at just call a native function to convert a string to an integer.

The JavaScript code call parseInt. The Python code calls int(), and isdigit(). The Ruby code calls to_i or Integer key. The PHP code calls is_numeric. The Java code calls Integer.parseInt().

All of these accept JSON segments that are not valid array indices: including leading whitespace, "+" signs, 0x hex values, decimal points, exponents, "_" digit separators, unnecessary leading 0's, trailing letters, and more.

--->8---

stefankoegl commented 11 years ago

http://tools.ietf.org/html/draft-ietf-appsawg-json-pointer-06 states

   o  If the currently referenced value is a JSON array, the reference
      token MUST contain either:

      *  characters that represent an unsigned base-10 integer value
         (possibly with leading zeros), making the new referenced value
         the array element with the zero-based index identified by the
         token, or

So I don't see why 007 shouldn't be allowed.

mikemccabe commented 11 years ago

Stefan's point stands, but we should still guard against implementation-specific number parsing.

I'm not quite sure how to test this, though - maybe something like

{ "comment": "test against implementation-specific numeric parsing",
  "doc": {"1e0": "foo"},
  "patch": [{"op": "test", "path": "/1e0", "value": "foo"}],
  "expected": {"1e0": "foo"} },

... and...

{ "comment": "test with bad number should fail",
  "doc": ["foo", "bar"],
  "patch": [{"op": "test", "path": "/1e0", "value": "bar"}],
  "error": "test op shouldn't get array element 1" },

but I think more are needed.

mnot commented 11 years ago

seems like a good start...

mikemccabe commented 11 years ago

Reading the spec, all it says is 'characters that represent an unsigned base-10 value.' The spec at json.org does allow e.g. foo[1e0] - but I don't think json-pointer can, because it can't distinguish between 1e0 (the number) and "1e0" (the string). I think we should clarify json-pointer for this case. Another option is to use the same spec as json, and accept that some kinds of documents can't be fully accessed.

stefankoegl commented 11 years ago

Why should there be ANY number parsing for object members? My understanding is that array indices are always numeric, and keys of object members are always strings. So there would be no need to distinguish between 1e0 (number) and "1e0" (string) in either case.

This would mean that we're not forced to limit the spec to \d+ numbers, but I'd also understand the rationale behind it if we did.

mnot commented 11 years ago

The problem is that different languages have different conventions for expressing these forms; e.g., in Python:

int("1e0") Traceback (most recent call last): File "", line 1, in ValueError: invalid literal for int() with base 10: '1e0'

mpcm commented 11 years ago

In terms of json-pointer, I am not sure it is reasonable to support non-integer indexes during array path resolution. Including overly complex representation with trailing decimals.

This seems to open up a large set of mathematical expressions that might infer should be supported, and could have issue supporting in various implementation languages, and thus require support of a non-native or possibly conflicting (?) syntax.

Looking at it another way, to me, expecting 1e0 to resolve in an array, is not much different than expecting 2-1 to also resolve... both are interpretive expressions for the core language, and both must be resolved to a integer index. The test here is not for equality, but should be much stricter.

In particular, I see value in having json-pointer string equality, to verify the identity of a path without having to parse out subexpressions based on the actual data structure. "/a/b/c/1" should not also be resolvable by "/a/b/c/1.00" to the same point.

The thread below seems to touch on similar points, for values like foo[1.000000000] as an index: http://grokbase.com/t/gg/json-schema/129skm3shd/json-value-equality-a-rather-intrusive-change-or-not

As does the related: https://github.com/json-schema/json-schema/issues/27

In particular, since JSON does not support sparse arrays, supporting these syntaxes offers little value (IMHO).

bruth commented 11 years ago

The spec should be straightforward and not influenced by any specific language. Inferring an array index when it is not a literal integer is, in my opinion, not in the scope of this spec. Keep it simple.

mikemccabe commented 11 years ago

Should we require 0|[1-9][0-9]*, then? (And exclude e.g. 1e0?) I think so...

It looks like this was a real puzzle for json-schema, but the situation is different for json-pointer, as it's defined as it's own string format; applications building json-pointers must create strings, and can easily be required to collapse other 'integer' representations to the 'counting number' representation most people expect.

I like this solution (the regexp above), as it's much easier for implementations to deal with, than to require (or confirm) that each has a numeric check that is exactly congruent with JSON's definition of a number.

stefankoegl commented 11 years ago

+1

bruth commented 11 years ago

@mikemccabe :+1:

mpcm commented 11 years ago

+1 to 0|[1-9][0-9]*

mnot commented 11 years ago

The language I'm putting in the latest draft is 1*DIGIT (defined in RFC5234). I.e., one or more digits, 0-9.

Note that this includes leading 0's. I've done that because "including leading zeros" was an explicit decision earlier, and I'm loathe to make that kind of change this late in the game.

That said, if this is a real problem for you, please do say so.

stefankoegl commented 11 years ago

As @mpcm already pointed out, with 0|[1-9][0-9]* it would be possible to compare json-pointers for equality by just comparing their corresponding strings. If we allow leading zeros we'd have to accept false negatives, even with parsing the individual parts of a pointer.

Example: consider the json-pointers /a/01 and /a/1. We'd have to treat them as unequal, even though they would be equal when dereferenced against something like {"a": [1, 2, 3]}, but not against {"a": {"01": 5, "1": 6}}

Was there a specific reason for allowing leading zeros in the first place? If there's no compelling reason I'd be in favor of using 0|[1-9][0-9]*

bruth commented 11 years ago

I agree with @stefankoegl, the benefit of being able to compare JSON pointers greatly outweighs the need (and likelihood) for array indexes with leading zeros.

mpcm commented 11 years ago

Even this late in the game, there is no time like the present to address this. ;)

I'd prefer to see leading zero's dropped. The identity aspect of a json-pointer string is of great value to me. Some of my code reason about structures via json-pointers without actually having the data present.

Is the support for leading zero's of great value to others, since at resolution time they must be resolved to a non-leading zero index in most implementations anyway?

mnot commented 11 years ago

OK, I've changed the spec to preclude leading zeros. Thanks!

tenderlove commented 11 years ago

Just for clarification, if you have a JSON pointer like this: /foo/01, and these two documents:

{"foo": [1,2,3]}
{"foo": {"01": "bar"}}

When you dereference the pointer against the first document, you'll get an error, but it will be successful against the second document? If I understand correctly then I'll add some tests for it. :-)

stefankoegl commented 11 years ago

Exactly, because indexing an array does not allow leading zeros.

mnot commented 9 years ago

Think this can be closed...