Closed spacether closed 11 months ago
This came up when working on my java implementation here: https://github.com/openapi-json-schema-tools/openapi-json-schema-generator and reading this article about grapheme length vs code points: http://illegalargumentexception.blogspot.com/2009/05/java-rough-guide-to-character-encoding.html#javaencoding_strnlen
Looks good to me too, just change it in the other drafts too
change it in the other drafts too
Oh! Please also check for maxLength
tests as well.
Maybe we should pick some characters that actually make sense -- as far as I can tell "\uD83D\uDCA9" isn't valid utf8.
I did some digging in the history and the test was added here: #52 - and indeed this is UTF-16, not UTF-8. we should fix that. The proper sequence for the poop emoji, that should appear in the file, is \u1F4A9
-- but I think we should pick something better -- a sequence of unicode codepoints that combine together into one grapheme.
No, JSON requires that it be broken out into the surrogate pair.
To escape an extended character that is not in the Basic Multilingual Plane, the character is represented as a 12-character sequence, encoding the UTF-16 surrogate pair. So, for example, a string containing only the G clef character (U+1D11E) may be represented as "\uD834\uDD1E".
@spacether Just wanted to make sure you saw this request to add this change to other drafts as well.
Updates min/maxLenth tests to mention graphemes rather than unicode points Updates the language for the test that verifies that one grapheme is too short when checked against minLength of 2