json-schema-org / json-schema-spec

The JSON Schema specification
http://json-schema.org/
Other
3.45k stars 251 forks source link

Extend treatment on Unicode and/or its security considerations #215

Open awwright opened 7 years ago

awwright commented 7 years ago

Unicode is a complex technology that probably nobody will ever fully understand. But we should add a few notes on common implementation conserns, especially security considerations.

Also consider the behavior of applications that use e.g. UTF-16:

> '🐲' .length // U+1F432
< 2
brettz9 commented 7 years ago

Drawing out some implications of your example... With maxLength/minLength, the validation spec states these refer to the "number of its characters as defined by RFC 7159." (the JSON spec)

The latter, however, while referring to Unicode "characters" as being escaped as UTF-16, also states, "implementations might return different values for the length of a string value", so it would probably help to be more clear on what the intention is here in deferring to the JSON spec.

For example, to enforce the string length is no longer than in your example, should maxLength be 1 or 2? I don't think the current spec is actually very clear on this.

akuckartz commented 6 years ago

I agree that it makes sense to think about security aspects of Unicode. But these aspects are not specific to JSON Schema. A separate document might make sense which can be developed by a broader community (including JSON-LD supporters for example).

brettz9 commented 6 years ago

In the case of maxLength and minLength, if one mistakenly relies on them, these are JSON Schema-specific issues. But again, I don't think the behavior is clearly spec'd.

epoberezkin commented 6 years ago

should maxLength be 1 or 2

@brettz9 there are tests that require it's 1

brettz9 commented 6 years ago

Sure, @epoberezkin , but the text ought to be clarified regardless.