ohler55 / oj

Optimized JSON
http://www.ohler.com/oj
MIT License
3.12k stars 250 forks source link

`Oj::Parser#load` fails on buffered input #926

Closed Bo98 closed 3 weeks ago

Bo98 commented 4 weeks ago

This code works:

data = Oj.dump([1] * 8000)
Oj::Parser.usual.load(StringIO.new(data))

This code however fails:

data = Oj.dump([1] * 9000)
Oj::Parser.usual.load(StringIO.new(data))
parse error, not closed at 1:-1 (EncodingError)

It seems the check added in https://github.com/ohler55/oj/commit/0fd31c474702d7daa56cf3be7b568ef33359962d runs on each parse block rather than once when everything is finished.

ohler55 commented 4 weeks ago

I'll look into the history of the depth limit. I believe it has something to do with being compatible with the JSON gem but I'll have to look back at old notes.

Bo98 commented 4 weeks ago

To clarify, the issue happens when the input is over a certain byte size (16385) regardless of the number of elements or the level of nesting. Another problematic example:

data = Oj.dump([1000000] * 2049)
Oj::Parser.usual.load(StringIO.new(data))

It's when it has to do multiple parse runs rather than one: https://github.com/ohler55/oj/blob/0fd31c474702d7daa56cf3be7b568ef33359962d/ext/oj/parser.c#L1358-L1365

The issue also does not happen with Oj.load, which operates fine with significantly larger inputs:

data = Oj.dump([1] * 100000000)
Oj.load(StringIO.new(data))

Ruby's JSON.load also works ok.

ohler55 commented 4 weeks ago

That is helpful. Thank you.

ohler55 commented 4 weeks ago

Just pushed a branch names "fix-parser-eof" that has the fix. Please let me know if it works for you.

Bo98 commented 4 weeks ago

Works great, thanks!

Git diff looks a little weird since it seems like a mixture of tabs and spaces are used for indentation: https://github.com/ohler55/oj/commit/036be24785abfb22d71feb8ffed3ad41f90083d0. But that's just a minor thing.

Bo98 commented 4 weeks ago

load_rescue already catches EOFErrors so you could also move the check there instead of a eof? check. Both works in the end, though probably isn't necessary to have both a rescue and a eof? check.

ohler55 commented 4 weeks ago

I considered that but to catch the eof error in the C code would require a wrapper function and some way to pass back the result. Now if ruby, most definitely.

ohler55 commented 3 weeks ago

Released v3.16.4 with the fix.

Bo98 commented 3 weeks ago

Thank you!