jjenkov / parsers-in-java

A JSON parser implemented in Java, to show how to implement high performance parsers in Java.
108 stars 31 forks source link

ArrayIndexOutOfBoundsException #1

Closed slandelle closed 10 years ago

slandelle commented 10 years ago

Hi,

I wanted to give your JSON parser a spin after reading your post on InfoQ.

Getting an ArrayIndexOutOfBoundsException with the following JSON string:

{ "store": {
    "book": [ 
      { "category": "reference",
        "author": "Nigel Rees",
        "title": "Sayings of the Century",
        "price": 8.95
      },
      { "category": "fiction",
        "author": "Evelyn Waugh",
        "title": "Sword of Honour",
        "price": 12.99
      },
      { "category": "fiction",
        "author": "Herman Melville",
        "title": "Moby Dick",
        "isbn": "0-553-21311-3",
        "price": 8.99
      },
      { "category": "fiction",
        "author": "J. R. R. Tolkien",
        "title": "The Lord of the Rings",
        "isbn": "0-395-19395-8",
        "price": 22.99
      }
    ],
    "bicycle": {
      "color": "red",
      "price": 19.95
    }
  }
}

Full stacktrace:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 715
    at com.jenkov.parsers.json.JsonTokenizer.parseStringToken(JsonTokenizer.java:67)
    at com.jenkov.parsers.json.JsonTokenizer.parseToken(JsonTokenizer.java:56)
    at com.jenkov.parsers.json.JsonParser.parseObject(JsonParser.java:43)
    at com.jenkov.parsers.json.JsonParser.parseArray(JsonParser.java:77)
    at com.jenkov.parsers.json.JsonParser.parseObject(JsonParser.java:47)
    at com.jenkov.parsers.json.JsonParser.parse(JsonParser.java:22)

Regards,

Stéphane

slandelle commented 10 years ago

It looks like you're only supporting String values (ie surrounded by quotes). Numbers and booleans are not.

jjenkov commented 10 years ago

Try quoting all values and see what happens.

JSON on the wire is different from JSON in the browser. In the browser you do not have to qoute field names, but you do on the wire.

I actually do not know if you need to quote all field values too, but that is what I have required so far. Changing the parser to support unquoted numbers would not be difficult though, and it would not impact performance significantly.

On 18 November 2013 15:55, Stephane Landelle notifications@github.comwrote:

It looks like you're only supporting String values (ie surrounded by quotes). Numbers and booleans are not.

— Reply to this email directly or view it on GitHubhttps://github.com/jjenkov/parsers-in-java/issues/1#issuecomment-28703802 .

Jakob Jenkov jakob@jenkov.com Efficiency Matters

Jenkov Aps http://jenkov.com http://tutorials.jenkov.com

http://feeds2.feedburner.com/jenkov-com

jjenkov commented 10 years ago

I am pretty sure the parser has more corner cases which it cannot handle. Like I say in the article, it is merely a proof of concept implementation. However, the index overlay parser design principle is still useful.

On 18 November 2013 16:18, Jakob Jenkov jakob@jenkov.com wrote:

Try quoting all values and see what happens.

JSON on the wire is different from JSON in the browser. In the browser you do not have to qoute field names, but you do on the wire.

I actually do not know if you need to quote all field values too, but that is what I have required so far. Changing the parser to support unquoted numbers would not be difficult though, and it would not impact performance significantly.

  • Jakob Jenkov

On 18 November 2013 15:55, Stephane Landelle notifications@github.comwrote:

It looks like you're only supporting String values (ie surrounded by quotes). Numbers and booleans are not.

— Reply to this email directly or view it on GitHubhttps://github.com/jjenkov/parsers-in-java/issues/1#issuecomment-28703802 .

Jakob Jenkov jakob@jenkov.com Efficiency Matters

Jenkov Aps http://jenkov.com http://tutorials.jenkov.com

http://feeds2.feedburner.com/jenkov-com

Jakob Jenkov jakob@jenkov.com Efficiency Matters

Jenkov Aps http://jenkov.com http://tutorials.jenkov.com

http://feeds2.feedburner.com/jenkov-com

slandelle commented 10 years ago

Try quoting all values and see what happens.

Yep, the parser works fine with only Strings.

In the browser you do not have to qoute field names, but you do on the wire.

JSON is an RFC. The spec doesn't make any special case regarding browser/wire. Numbers and booleans are not quoted.

Thanks anyway.

jjenkov commented 10 years ago

Hi Stephane,

It is possible that the spec doesn't make any special case regarding browser/wire, but if you try to parse a JSON string from a server using jQuery's AJAX API, it only works when the field names are quoted. I don't know why that is, but from that I reasoned that there is a difference.

On 18 November 2013 16:25, Stephane Landelle notifications@github.comwrote:

Try quoting all values and see what happens.

Yep, the parser works fine with only Strings.

In the browser you do not have to qoute field names, but you do on the wire.

JSON is an RFC http://tools.ietf.org/html/rfc4627. The spec doesn't make any special case regarding browser/wire. Numbers and booleans are not quoted.

Thanks anyway.

— Reply to this email directly or view it on GitHubhttps://github.com/jjenkov/parsers-in-java/issues/1#issuecomment-28706536 .

Jakob Jenkov jakob@jenkov.com Efficiency Matters

Jenkov Aps http://jenkov.com http://tutorials.jenkov.com

http://feeds2.feedburner.com/jenkov-com

slandelle commented 10 years ago

Maybe jQuery only support a part of the spec, can't say. But all the other parsers do: google-gson, jackson, json-smart... If you say that supporting unquoted values would hurt your implementation's performance, then your benchmark is flawed.

jjenkov commented 10 years ago

I said that it would NOT impact performance significantly :-) It would be a new set of case-statements catching [0 - 9] chars, and calling a parseNumber() method which would be specialized at parsing numbers, just like the parseString() method parses a string. You would be able to expect about the same performance from parsing a number as from parsing a quoted string.

On 18 November 2013 16:40, Stephane Landelle notifications@github.comwrote:

Maybe jQuery only support a part of the spec, can't say. But all the other parsers do: google-gson, jackson, json-smart... If you say that supporting unquoted values would hurt your implementation's performance, then your benchmark is flawed.

— Reply to this email directly or view it on GitHubhttps://github.com/jjenkov/parsers-in-java/issues/1#issuecomment-28708016 .

Jakob Jenkov jakob@jenkov.com Efficiency Matters

Jenkov Aps http://jenkov.com http://tutorials.jenkov.com

http://feeds2.feedburner.com/jenkov-com

jjenkov commented 10 years ago

By the way, this is the first time I am using the GitHub issue system. It's pretty useful!

slandelle commented 10 years ago

Ah, sorry, I didn't read you properly.

BTW, I have a JSON parsers benchmark suite here. If you're willing to support numbers and booleans, I'd happily add yours.

jjenkov commented 10 years ago

Sure, why not? Might as well do that. Give me a few days to find a free time slot to implement and test it. If I forget, remind me.

jjenkov commented 10 years ago

I have updated the code now so it should support numbers now. I would still not claim that the code is production quality!

slandelle commented 10 years ago

I think the issue can be closed for now. FYI, I've run my benchmark, and it seems like your approach is more efficient when dealing with big JSON payloads, but not with smaller ones.

jjenkov commented 10 years ago

If you look at my benchmarks in the InfoQ.com article, you will see the same phenomenon. For small JSON files my parser performs only a little faster than GSON, but for larger ones, it is up to 2.5 times faster. I don't know exactly why that is. Perhaps GSON has more "accounting" stuff to do internally when the JSON file grows.

slandelle commented 10 years ago

GSON, like other parsers, builds an AST, so that involves more allocation and GC pressure.

jjenkov commented 10 years ago

I have optimized the parser for smaller JSON files now. I have removed the unnecessary reallocation of the IndexBuffer for the JsonTokenizer. That has increased the benchmark of the small JSON files significantly.