searchbox-io / Jest

Elasticsearch Java Rest Client.
Apache License 2.0
2.11k stars 728 forks source link

why not use jackson json to speed up performance? #45

Open lxbzmy opened 11 years ago

lxbzmy commented 11 years ago

why not use jackson json to speed up performance?

ferhatsb commented 11 years ago

I was also thinking about to switch jackson over gson however I think performance gain from here can not be our first priority in terms of our limited resources. Let's target this after next release.

lxbzmy commented 11 years ago

Ok. current now , I used flexjson, gson, jackson. json-lib,fast json and etc.

I feel flexjson has a good interface for human, jackson has better performance.

2013/6/18 Ferhat Sobay notifications@github.com

I was also thinking about to switch jackson over gson however I think performance gain from here can not be our first priority in terms of our limited resources. Let's target this after next release.

— Reply to this email directly or view it on GitHubhttps://github.com/searchbox-io/Jest/issues/45#issuecomment-19572059 .

filippor commented 11 years ago

why not externalize serialization and deserialization and permit multiple implementation

fabn commented 11 years ago

+1 for @filippor comment. It would be the very best thing. I already have a lot of Jackson annotated POJOs so for my use case change the json library to Jackson would be fine, but externalize the dependency (something like slf4j or commons logging) will allow any json library pulled in at runtime.

ferhatsb commented 11 years ago

I agree with you guys, that just requires extra work. We maybe start to work on this next week. I will share our status.

imod commented 10 years ago

Is this still on the roadmap?

kramer commented 10 years ago

Eventhough no one has started the work yes it is still on the roadmap as a legit improvement. We're open for any and all kinds of contributions and ideas on the topic.

imod commented 10 years ago

@kramer any hint/idea on where to start with this?

kramer commented 10 years ago

@imod I'm in favor of @filippor's idea rather than just switching to Jackson. I think we'd need to extract Gson calling code and somehow make it JSON (de)serializer _unaware_; this probably means we'll need an adapter interface for json operations which then will call the correct implementation... Since some gson classes are very tightly coupled in current codebase, separating the responsibilities and then externalizing json ops would require substantial amount of work.

lxbzmy commented 10 years ago

yes,extra some interface first.

2014-04-08 4:04 GMT+08:00 Cihat Keser notifications@github.com:

@imod https://github.com/imod I'm in favor of @filipporhttps://github.com/filippor's idea rather than just switching to Jackson. I think we'd need to extract Gson calling code and somehow make it JSON (de)serializer unaware; this probably means we'll need an adapter interface for json operations which then will call the correct implementation... Since some gson classes are very tightly coupled in current codebase, separating the responsibilities and then externalizing json ops would require substantial amount of work.

Reply to this email directly or view it on GitHubhttps://github.com/searchbox-io/Jest/issues/45#issuecomment-39776649 .

apatrida commented 10 years ago

It seems like you are solving a problem outside the scope of JEST. Pick the best single library for JEST, shade it so that it won't conflict with our obvious use of similar libraries in different versions, and make JEST as best as possible. Creating an slf4j type library for JSON seems like another project that once it exists you could use. Until then, keep the focus ;-) Jackson + Jaxb annotations are most common, use it for that reason plus its great performance.

imod commented 10 years ago

I think @jaysonminard is right, that's way more complicated not saying it has no value, it actually would be a great thing - but then this should not be in the context of Jest but something independent and maybe used by Jest (one day in the far far future)

imod commented 10 years ago

btw. JSR-353 Java API for JSON Processing will most probably solve this too - but it might take a while until we get something... http://download.oracle.com/otndocs/jcp/json-1_0-pfd-spec/index.html

kramer commented 10 years ago

isn't JSR-353 finalised (http://docs.oracle.com/javaee/7/api/javax/json/package-summary.html)?

imod commented 10 years ago

that's true, but the current API does not provide any databinding functionality - so no object to JSON binding, its all pretty low level

imod commented 10 years ago

interesting: Future of Jackson - Java 8 - JSR 353 http://jackson-users.ning.com/forum/topics/future-of-jackson-java-8-jsr-353

daniilyar commented 10 years ago

+1, using Jackson will be much better than GSON. Real-world example: ES could store both arrays and primitives in a same field: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-array-type.html But GSON fails to parse, if the same field will contain both array and primitive for different documents Additionally, Jest logs error in case of such failures instead of rethrowing it, which is the bad practice: see Jest error log below for example. Currently this is the only thing because of which we cannot use Jest in production conveniently.

Example error for this case: "[04/22 05:38:03:997] [test] [JestResult] [] [ERROR]: Unhandled exception occurred while converting source to the object XXXXX com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected BEGIN_ARRAY but was STRING at line Z column Y"

Jackson would solve our problem with deserialization of such documents as it supports a property which always accepts single values as arrays when deserializing. Example:

 new ObjectMapper().configure(DeserializationFeature.ACCEPT_SINGLE_VALUE_AS_ARRAY, true); 

Another point is that with Jackson users will be able to use annotation-based custom deserialization strategies (GSON does not support them). Example:

@JsonProperty("ReceiveTime")
@JsonDeserialize(using = CustomIso8601JsonDateDeserializer.class)
@JsonSerialize(using = CustomIso8601JsonDateSerializer.class)
private Date receiveTime;

Where custom serializer could try several date formats one by one, or etc... And also Jackson is about 1.5 times faster when deserializing huge json output =)

And yes, it would be better to just externalize serialization / deserialization and allow multiple implementations =)

bobbyhubbard commented 9 years ago

Is someone working on this? I might be able to take it if not

kramer commented 9 years ago

@bobbyhubbard No, nobody is working on this currently, you can pick it up if you wish. Since it's kind of a big change it'd be nice to hear your rough attack plan beforehand or to see your progress on small increments to give early feedback :).

bobbyhubbard commented 9 years ago

Sure. I'll take a little deeper look

ctrimble commented 9 years ago

I took a look at what would be required to replace Gson with ObjectMapper last night and it looks like the changes will be significant. In particular, JsonObject can be found almost throughout the code base, including a lot of manual mapping type code.

I would be willing to help get this moving, but the way things stand, it would be very hard to produce a super clean migration path. In the best case, perhaps a version to step through where the JsonObject was generic could be done, with the old commands moved to a new module. This would probably also necessitate use of Java8, to abstract things like testing for errors in the parsed body in JestResult.

@kramer Would the project be willing to take a change like this? I could workup a PR for the search API, but I don't want to put in a ton of effort and end up forking this library.

ctrimble commented 9 years ago

@bobbyhubbard did your investigation into using Jackson with Jest produce any results?

kramer commented 9 years ago

@ctrimble At this point I'm conflicted about this switch considering pros & cons. I don't consider "speed" as a deciding factor because (i) after all each json operation is paired up with a network operation which is much more "expensive" than doing (de)serialization and (ii) a simple google search forms the opinion that recently gson caught up with jackson in speed for most use cases. Then we are left with some parse case pros for jackson like: better (?) annotation support, partial doc/class support and maybe better handling of cases like the one @daniilyar mentioned; and the biggest con is the huge breaking change for the public api and the actual effort needed to do the switch. If those are all we have and we are discussing complete switch to jackson then that is where I'm conflicted: is it really worth it?. If there was a way to keep gson and have jackson in addition then I'd be all for it.

I am very much open to discussion on this.

p.s.:

necessitate use of Java8

I don't think android module would like that.

ctrimble commented 9 years ago

@kramer for me, and I would assume others, this has nothing to do with performance. If performance was that big of a deal, it would be worth purpose building something.

What I am after is clean call sites. The way things stand, I end up with lots of JsonObject based code in my project. It is getting to the point that I am ready to fork Jest, just to get that worked out. I would much rather start contributing here and help everyone.

I am sure that some kind of step through version could be produced, allowing people to half upgrade. I know I would need that. I have way too much code at this point to make that lift in one shot.

As for Java8, that would not be a necessity, but some strategies will be needed to make a reasonably smooth transition. Lambdas just make that kind of code a non issue.

Nirvana would be a client api that looked more like this:

SearchResult result = client.execute(action);
if( !result.isSucceeded() ) {
  // fail
}
List<MyType<SubType>> hits = result.hits().as(GenericType<MyType<SubType>>(){});
kramer commented 9 years ago

step through version

Could you explain that in more detail please? My train of thought on this: it's not like we have constant critical updates; so people can just continue using the old version until they are ready to make the change.

ctrimble commented 9 years ago

I have lots of code on my hands, so I would want an upgrade to go something like this:

  1. update maven dependency version.
  2. bulk update all package Jest related package names.
  3. upgrade/add some code against the updated API.

At the same time, I would probably be willing to do a bulk port in this situation. It really depends on where the refactoring lands. More than half of my code dealing with Jest is wrangling JsonObject. The code is ugly and error prone. It would be worth the effort.

Would you be willing to look at a throw away PR for the search API? I could knock something like that together without too much effort, just to give us something concrete to talk about.

kramer commented 9 years ago

Would you be willing to look at a throw away PR for the search API?

@ctrimble Yes, of course.

nicoruti commented 9 years ago

+1 for injecting an own serializer through an interface.

I switched to Jest from the official ES client, where the serialization is left to the user. Using the ES client, I created my (annotated) Objects to fit an ObjectMapper / Jackson. Now moving to Jest (with built-in Gson) would mean to change the Objects, which is not possible.

For compatibility with all projects, you could provide a method like JestResponse#getSource(): JsonElement or even JestResponse#getSource(): String.

I'm doing the following workaround at the moment:

JsonObject source = response.getSourceAsObject(JsonObject.class);
source.remove(JestResult.ES_METADATA_ID); // optional
MyClass result = objectMapper.readValue(source.toString(), MyClass.class);

Although it's not very efficient, it works.

ctrimble commented 9 years ago

I got swamped while working on an example PR for this and I don't think I will have time to get back to it soon.

nicoruti commented 8 years ago

I looked into the code; it makes heavy use of Gson's JsonObject / JsonElement / JsonArray class. Replacing Gson would be a major refactoring. Nevertheless, I created a PR (see #249 ) which adds a method to get the result as a plain string, such that developers can parse it themselves with any preferred mapper. In my opinion a missing feature.

raunak commented 8 years ago

@ctrimble can you push the work you did? I can attempt to complete the remaining parts.

Most frameworks out there use jackson for parsing json (in my case play framework), and it really annoys me that you end up importing different libraries into a project designed to do the same thing. For example, I now have POJO's with jackson and gson annotations. It might just be me, but I don't find that ok.

ctrimble commented 8 years ago

@raunak it isn't in a compiling state, but here you go. https://github.com/ctrimble/Jest/tree/experiment_object-mapper

antonsarov commented 8 years ago

+1 @raunak

It is not just the performance factor. I have Jackson annotations on POJOs (e.g. @JsonManagedReference) which help me a lot for the internal REST API which I am providing. Such functionality is only difficult to maintain using Gson (own serializer, etc.) And importing multiple JSON libraries is really not the idea of any project.

jmlw commented 7 years ago

+1 for providing a configurable interface for deserialization.

A real world use case for my company is adding results of a large computation producing a very large POJO with many nested objects, serialized with Jackson. When attempting to deserialize the json result of _source, enum deserialization fails because of different serialization / deserialization strategies. In my case, Gson failed silently with a duplicate key exception from deserializing an enum to null, and attempting to add these null keys to a map, returning a null value for my POJO. Being able to supply a Jackson object mapper rather than using the default Gson implementation will allow much greater flexibility.

For the time being, I've worked around this issue by extracting the JsonObject _source directly from the hits array with the json methods provided by Gson, then mapping each source to the POJO it represents.

ssozonoff commented 6 years ago

Looks like this has stalled for now. Seems GSON does not work well with Lombok either when deserialising something with an @Data annotation. Jackson just works but cant be plugged into Jest.