How to efficiently evaluate the number of json tokens

zserge / jsmn

Jsmn is a world fastest JSON parser/tokenizer. This is the official repo replacing the old one at Bitbucket

MIT License

3.64k stars 778 forks source link

How to efficiently evaluate the number of json tokens #227

Open mm-longcheng opened 1 year ago

mm-longcheng commented 1 year ago

I Know passing NULL instead of tokens will return the number of tokens needed. But it is equivalent to doing parse twice.

pt300 commented 1 year ago

There practically isn't really a good way to get this number without parsing. I would just suggest parsing with some set length of tokens array and expanding it if JSMN runs out of tokens.

mm-longcheng commented 1 year ago

I noticed that the tokens in the parsing process must be a continuous array, so once the expanding occurs, is it must to copy the token data?

There seems to be an interface including GetToken and AddToken. Managed by externally provided tokens. Maybe make the interface more flexible.

I have a maybe wrong idea: Define JSMN_PARENT_LINKS, sizeof(jsmntok) = 20, not suitable for 1k alignment. This seems a little unfriendly. Is it possible to make type and size four bytes, aligning the structure. The type actually has only 5 3bits, so the size has 2^21=2097152, is that enough?

tomasrodathg commented 7 months ago

I know this is old, but you could in theory add a “walker” that does the lexical analysis and counts tokens without caring about labeling them, allocating memory… it would be faster and more efficient as all it needs to hold is an unsigned integer for the count (potentially a signed one to signal an error if the json is invalid)