resyncgg / json-stream

A library to parse Newline Delimited JSON values from a byte stream
Apache License 2.0
4 stars 1 forks source link

optimization: mode to check the tail of a chunk to see if it's worth attempting to read an object #4

Open tychoish opened 6 months ago

tychoish commented 6 months ago

Just to capture a thought, (and sorry for all of my noise), If we know (or can be told) that the stream is going to be a bounded object type (e.g. strings, arrays, objects) we can inspect the end of every chunk and avoid attempting to parse a chunk if we know that the end of the chunk doesn't contain whitespace or another terminator. (e.g. if the first character of the chunk is [ and the last is " then we should wait for more input rather than attempting to parse the data and learn that it's incomplete.

Whitespace at the beginning can be ignored/dropped, but whitespace at the end is ambiguous, any place where there's a match is potentially a false-positive, but we could greatly reduce the number of times we need to call the deserializer, which would improve overall performance.