wanglingsong / JsonSurfer

A streaming JsonPath processor in Java
MIT License
294 stars 55 forks source link

GsonParser converting longs to doubles #44

Open cmuchinsky opened 6 years ago

cmuchinsky commented 6 years ago

The GsonParser is converting longs to doubles within the numberHolder implementation. Instead of calling return jsonProvider.primitive(jsonReader.nextString()); perhaps something like this would work better:

final String value = jsonReader.nextString();
try {
    return jsonProvider.primitive(Long.parseLong(value));
}
catch (final NumberFormatException e) {
    return jsonProvider.primitive(Double.parseDouble(value));
}
wanglingsong commented 6 years ago

Did you test it? Are sure no any exception would be thrown when calling nextString() following "NUMBER" token?

cmuchinsky commented 6 years ago

Yes, using the attached snippet, it worked as expected. Per the JsonReader.nextString javadoc: If the next token is a number, this method will return its string form

wanglingsong commented 6 years ago

I think it would introduce too much overhead for the potential two more parsing. If you really need long type, I think you can implement a custom JsonProvider.

cmuchinsky commented 6 years ago

Doing this at the provider level could work, however I believe the core issue is in the parser as that's where its forcing the long into a double via the call to jsonReader.nextDouble(). By the time it gets to the provider its already been turned into a double. The JsonReader class does internally keep track of whether its a long or double, but unfortunately it doesn't make that information available to public consumers. I will check if its possible to extend JsonReader to gain access to the peeked member, which if set to 15 indicates its a long vs a double.

cmuchinsky commented 6 years ago

Unfortunately it looks like JsonReader::peeked is package scoped and not protected

wanglingsong commented 6 years ago

Actually, I'm curious about your use case? What kind of benefit can you gain from such conversion?

cmuchinsky commented 6 years ago

The use case is that the json we parse and filter needs to retain its original formatting so that when we do schema inference it doesn't change types from a long to a double.

wanglingsong commented 6 years ago

So due to such a limitation of Gson, maybe you can try other JsonSurfer implementation, e.g. JacksonSurfer

cmuchinsky commented 6 years ago

Will give it a look, ideally I want an implementation that I can use in a streaming read and provider scenario. As the data is read and filtered with json path, the output is then fed to a provider that is simply streaming out the other side, that way if I hit a massive json document with a json path like $.* it wouldn't blow up trying to assemble the entire document in memory.