zserge / jsmn

Jsmn is a world fastest JSON parser/tokenizer. This is the official repo replacing the old one at Bitbucket
MIT License
3.68k stars 783 forks source link

Token size is broken (always == 0 or 1) #51

Closed zserge closed 9 years ago

zserge commented 9 years ago

This is a great library which I was successfully using, but in recent commits I have a problem with calculation of token size. Always the size is 0 or 1 for top-level or nested objects/arrays, and is 1 for Name strings (why would string have a size > 0?)

Given the simple JSON object:

{"foo":1234}

jsmn_parse returns 3, but token[0].size = 1

A simple nested object example:

{"foo":{"bar":1}}

jsmn_parse returns 5, but token[0].size = 1, and token[3].size = 1

I suspect this became broken at commit 84cb579. Token size was correct previously. Is it related to the "implemented key/value hierarchy"? (which is not documented anywhere that I can find)

Line 239-241:

#!c

            case ':':
                parser->toksuper = parser->toknext - 1;
                break;

Removing line 240 fixes the issue for non-strict parsing. However, with JSMN_STRICT, this causes other issues which I did not fully characterize.
I do not understand what is being done at line 240 (it probably is related to "implemented key/value hierarchy", but there are no comments or documentation about that).


zserge commented 9 years ago

Hello Serge,

Have you had a chance to investigate this issue? Should I re-open this issue in the new github repo?

Thank you


Original comment by: matthewb_gc

zserge commented 9 years ago

It appears this is probably not a bug, but rather is a misunderstanding on my part! Oops. I'll explain here in case anyone else makes the same misunderstanding...

The jsmn docs describe the jsmn_parse return value as "number of tokens actually used by the parser", and jsmntok_t .size member as "Number of child (nested) tokens". So, using my this example...

{"foo":{"bar":1,"no":"duh"}}

jsmn_parse returns 7, the absolute total tokens used, as I expected.

However, for the size of each token, jsmn is actually returning the contextual (according to token type) number of tokens! Not the same as the jsmn_parse return value!

So...

token[0].size = 1, since token[0] is Object with 1 JSON member
token[1].size = 1, since token[1] is Name with 1 JSON value (Name always has 1 value) token[2].size = 2, since token[2] is Object with 2 JSON members
token[3].size = 1, since token[3] is Name with 1 JSON value
token[4].size = 0, since token[4] is Value String (no child tokens, always 0)
...and so on...

Serge, maybe a simple note in the docs would clarify this for anyone who might have the same misunderstanding as me. :) Thanks!


Original comment by: matthewb_gc

zserge commented 9 years ago

(See my last comment)


Original comment by: matthewb_gc