parisholley / wordpress-fantastic-elasticsearch

Improve wordpress search performance/accuracy and enable faceted search by leveraging an ElasticSearch server.
MIT License
162 stars 64 forks source link

Author indexing #141

Closed ciaranmaher closed 7 years ago

ciaranmaher commented 7 years ago

This PR addresses a number of open issues around indexing author name: https://github.com/parisholley/wordpress-fantastic-elasticsearch/issues/110 https://github.com/parisholley/wordpress-fantastic-elasticsearch/issues/28 https://github.com/parisholley/wordpress-fantastic-elasticsearch/pull/17

I've hooked into the WordPress profile_update hook and combined it with the ElasticSearch _update_by_query API request. I believe the _update_by_query function is the one that @greatwitenorth was talking about in PR https://github.com/parisholley/wordpress-fantastic-elasticsearch/pull/17#issuecomment-19398068 back 3 years ago.

The ElasticSearch mapping on post_author is similar to how the taxonomies are mapped, i.e. post_author is not analyzed and post_author_name is analyzed which makes it easier to do exact matches and faceting etc.

In order to use the _update_by_query API method you need to enable inline scripting in your elasticsearch.yml file.

script.engine.groovy.inline.update: on
script.engine.groovy.inline.search: on
script.engine.groovy.inline.aggs: on

I've added a couple of more tests and all tests are passing. I was unable to get the selenium tests to run, tried in both Mac and Linux dev environments!

ciaranmaher commented 7 years ago

It looks like travis is having an issue with the custom elasticsearch.yml that needed to added.

All the test work fine when testing locally. I'll see if I need to added anything else to get travis to pass the tests.

parisholley commented 7 years ago

@ciaranmaher thanks for putting this together. my main issue with this pull is the requirement .yml config changes. many people use hosted ES services and may not have that available (if we can confirm the main providers do, then it may not be too bad). Would need to call ES (if possible), to see if the command is available (or just do some error handling) and then also provide a fallback (ie: pushing each document, but this can be a problem for large data sets).

From the previous discussion, I would probably lean toward just storing the id in ES to simplify it. Most people don't need author information by default, and the cost of running a single query to match author ids isn't a bad tradeoff.

If we keep this close to how it is implemented, probably need to consider making it an option in admin, and disable by default.