parisholley / wordpress-fantastic-elasticsearch

Improve wordpress search performance/accuracy and enable faceted search by leveraging an ElasticSearch server.
MIT License
162 stars 63 forks source link

Using WordPress-Elasticsearch Lib #72

Open timnashcouk opened 10 years ago

timnashcouk commented 10 years ago

Currently looking at Elasticsearch WordPress plugins to replace a shockingly built internal plugin.

Wondering if you are using the suggested Schema by @gibrown and Automatic? https://github.com/Automattic/wpes-lib

If not any intention to in the future?

parisholley commented 10 years ago

It is something I've looked at but not thought through. There are a lot of tests with the current approach (which I admit wasn't well thought out but it worked for my purposes).

Sent from my iPhone

On Dec 9, 2013, at 9:29 AM, Tim Nash notifications@github.com wrote:

Currently looking at Elasticsearch WordPress plugins to replace a shockingly built internal plugin.

Wondering if you are using the suggested Schema by @gibrown and Automatic? https://github.com/Automattic/wpes-lib

If not any intention to in the future?

— Reply to this email directly or view it on GitHub.

schorsch commented 10 years ago

I took a look at the automatic approach, found it to bloated for a single site. They need to go deeper in nesting and handle the unknown( content types, language, taxonomies,.. ) due to their multi blog architecture. The result may be quite hard to understand for a medium wp hacker(learning es), just wanting a descend search. I think the approach here fits pretty good out of the box. Besides parisholley has added lots of hooks to customize the in-output.

timnashcouk commented 10 years ago

While I partially agree that the Automatic schema is a little OTT its not covering anything that unusual and it is stuff many will expect to be potentially covered out of the box, custom post types, taxonomies etc meaning in the long run a lot less work. So it's a trade off between, a bit of a steep learning curve vs a lot of future code implementation.

It also has the additional benefit, of allowing people to make use of some of the WP.com VIP plugins that use elastic search with little to know modification and for their to be greater crossover. For me a common schema is really appealing, not only for greater interoperability but also data portability.

Hopefully some food for thought.

parisholley commented 10 years ago

I am all about not reinventing the wheel :) Just feels like that type of collaboration needs to exist in a project outside of something scoped for wordpress (ie: elastic index logic and best practices regardless of platform as a php library). The automatic approach (I believe) makes some assumptions on plugins/addons being available on ES server, as well as indexing content in a certain way (ie: including everything in default). I'd be more interesting in finding concepts inside of what they have and applying them, vs. trying to define a "standard" or pull in their library. Much of decisions made in this plugin were based on features being delivered (mostly with an emphasis on faceting) assuming the API in this plugin is used, and not querying on the ES server directly. I think the goal of being able to move off WP.com and use this plugin is admirable, but not something that I personally need (or have time to invest in), but would have no problem pulling in changes to support it!

gibrown commented 10 years ago

Sorry I'm so late to this conversation. Thought I'd elaborate on some of the design goals I stated in the wpes-lib readme.

My reason for thinking a common schema would be good is that I hope there will be a time where lots of people in the WP community are writing ES queries. I think many fewer people will have the ability to put together a full indexing schema and run their own ES cluster. We're looking at exposing the ES query DSL in our APIs. For instance the Jetpack Related Posts REST endpoint accepts ES filters: http://developer.wordpress.com/docs/api/1/post/sites/%24site/posts/%24post/related/

I really want to expose the power of the query DSL to as many developers as possible. It enables so many types of queries that cannot be accomplished otherwise. Having a common schema I think would further encourage developers to use it.

Our indices also started out very simple, and have evolved into what wpes-lib does (plus additional fields for wp.com specific stuff. eg likes). There are reasons for the complexity, happy to describe them if there are specific concerns. You are right that many do have to do with faceting, but I think faceted search is a really compelling feature. Also, ES can be a lot more than just search. There are a lot of natural language processing algorithms you could build on it.

I don't consider wpes-lib to be THE way to build the schema either. I'd love input on things that should be changed to make it better. I'm working on rebuilding our index over the next two months which provides a good opportunity for tweaking things.

There are definitely some assumptions I made about ES plugins. Could probably add some config options that bypass those pretty easily. That said, ICU in particular helps a lot. Even for English. Its scary what users type into a search box.

parisholley commented 10 years ago

Thanks for the input Greg! No doubt, I am sure the team at Automattic went much farther then I, given all the various types of websites that come along, and have a reason for every bit of schema in that library. My initial reaction was just I would like to learn more about some of that complexity as I haven't done many meaningful projects with ES to know the value of each. I would happy to collab on a version that maybe spit up the principles up more succinctly, backed by some tests to demonstrate. This plugin has much farther to go, but I've been so busy lately with my startups.

gibrown commented 10 years ago

Just remembered that I wrote about most of my reasoning here: http://gibrown.wordpress.com/2013/04/17/mapping-wordpress-posts-to-elasticsearch/

More fields have been added since then. Particularly around extracting info from the content field (links, images, shortcodes, mentions, hashtags). Uses this: https://github.com/Automattic/jetpack/blob/master/class.media-extractor.php

I just added a high level example of how to use the library (thought I had already done this). Its located at: https://github.com/Automattic/wpes-lib/blob/master/src/indices/wp-core-index.php

Hopefully that makes it a lot clearer how we are using this.

Happy to do a Skype call at some point if talking in person would help clarify anything. Though clearly discussing on here may be more likely to encourage someone else to dive in. :)

parisholley commented 10 years ago

Sure, how about a mix of both :) My skype is parisholley

timnashcouk commented 10 years ago

Just dropping a note that I'm continuing to lurk on this thread, after the initial comments we did some work with fantastic elastic search and put it on the shelf, though I have been watching. Meanwhile we built a bastardised version based on the code that was there before to implement a partial version of @gibrown schema. I really think while their is a low adoption within WordPress community of Elastic Search that there is an opportunity to standardise as it's only something that's going to increase and it will be far harder to reach a consensus when every search, facets sorting and related posts plugin insists on its own schema and index.

Happy to roll up sleeves and help see if its feasible.

parisholley commented 10 years ago

Thanks Tim! I'm thinking a few of us just need to find a way to collaborate on a branch of the wpes-lib (or maybe a new project?) and put together a plan to tackle it. Curious if you ran into any quirks migrating over to the lib as is?

gibrown commented 10 years ago

cc'ing a few other folks who I've discussed ES with and may be interested:

@mboynes @markoheijnen @alexkingorg

ziz commented 10 years ago

Looking in with interest.

gibrown commented 10 years ago

For anyone watching this thread, I am currently working on adding more fields to our post schema that will then get rolled into wpes-lib. Below is a gist of our post mappings. Working over the next week or so on making additions, so if you have any ideas let me know.

https://gist.github.com/gibrown/6c4cda9e82dfa5347b2e

FYI, this is the index mapping for WP.com VIPs. There are a number of fields here that are WP.com specific (eg likes), but it is a super set of the WP.org schema.