zserge / jsmn

Jsmn is a world fastest JSON parser/tokenizer. This is the official repo replacing the old one at Bitbucket
MIT License
3.72k stars 783 forks source link

Not enough information to easily parse the input #233

Open WarpRules opened 8 months ago

WarpRules commented 8 months ago

The library is great in its simplicity and extreme efficiency, not making any dynamic memory allocations on its own, etc. However, this simplicity apparently comes at the cost of making it extra complicated for the user code to use the library and parse complex JSON input.

Most particularly, an element that has child elements (ie. an object or an array) will have its size member tell you how many child elements it contains, but not how many tokens long the entire element is. Which means you can't just jump to the end of the current element, to the token that represents the next element.

For example, if an "object" element contains 3 child elements, the size member of the token for this "object" will be 3, which is important information to know how many child elements this object contains. However, it does not contain a member that tells how many tokens long this object is, which means you can't just jump to the token representing the element (at the same nesting level) after this one.

Suppose, let's say, that the input JSON consists of an array of objects. Every time the code encounters a token as a direct child element of the array, it calls a function to parse it (giving it the pointer to all the tokens and the index to the current token), and then it would want to jump to the next element in the array. It can't, because this information is not anywhere. There's no way of knowing how many tokens long this array element is, without actually going through the tokens and counting, using rather complex logic (especially since each element may be an object that may contain further deeply nested elements).

(The end element of jsmntok_t points to the end of this element in the original string, but this doesn't help at all knowing what is the token for the next element after this one.)

This would be easily fixed by adding an additional member to jsmntok_t: One that tells how many tokens long this element is. Adding this value to the index of this token directly tells you which token represents the elemnt following this one. (With object and array type elements it allows easily going from one child element to the next, in easy steps, by simply adding this value to the current element index.)

roberthusak commented 6 months ago

As mentioned in https://github.com/zserge/jsmn/issues/164#issuecomment-1906908419, you can just use start and end to skip all the nested elements of the given object or array using a simple loop.