neo4j-contrib / neo4j-elasticsearch

Neo4j ElasticSearch Integration
Apache License 2.0
210 stars 79 forks source link

Allow for ID and Labels fields to be configurable #17

Closed jacob-ewald closed 8 years ago

jacob-ewald commented 8 years ago

To prevent searching against the ID and Labels fields, this PR adds two configuration options to control whether or not those fields are added to the Elasticsearch index.

By setting:

elasticsearch.include_id_field=false
elasticsearch.include_labels_field=false

in the conf/neo4j.properties file, the two fields can be conditionally added to the index (or not). By doing so, a user can search against the people index for the search term "people" and not return every result in the people index. Previously, that search term would match every document and the results would then be useless.

I updated the readme to include these new configuration options (as well as some updates to the Examples section).

Additionally, I added a couple tests for these settings to be sure the defaults work, as well as for turning off the ID and Labels fields.

jexp commented 8 years ago

Do you have a better idea on how to connect labels ? or is the index name good enough?

Also for ID, if you leave the ID off how would you be able to relate the ES document back to Neo4j? E.g. to use these ID's to start a Cypher query by id.

jexp commented 8 years ago

Also do you think that should be a global setting or per index?

jacob-ewald commented 8 years ago

In my opinion, the index type is good enough. Although, I may be unaware of a use case where it's better to have all of the labels as an index field. In those cases, leaving the setting for include_labels_field set to true (or absent) would allow it to continue to work the way it does now.

The node ID is still attached to the document in Elasticsearch via the _id property. That's being set here.

As for whether or not these settings should be global, I think that would depend on what the majority of use cases require. If most users would want these as global settings, then this PR would suffice. But if they'd prefer setting it per index, the ElasticSearchIndexSpecParser would need changed in order to handle that. In order to do that there would need to be a syntax for specifying each setting for each index.

jexp commented 8 years ago

So we can remove ID altogether?

jexp commented 8 years ago

Can you also add a test without labels?

And I wonder if we should change the defaults for the settings to default to false?

jacob-ewald commented 8 years ago

I wouldn't say we should remove ID altogether; it's still needed in place like these: create, update, delete. But it doesn't seem necessary to keep it also as a user-generated field. I think the _id that Elasticsearch uses will be sufficient.

I added two tests with this PR, one without ID and one without Labels.

I had the defaults set to true so that this wouldn't be a breaking change, and existing users could upgrade without a need to change their config to keep existing functionality. Defaulting to false would require them to set those two new config options to true in order to maintain the ID and Labels fields like they have now.

jexp commented 8 years ago

Could you rebase the PR so that I could merge it?

jacob-ewald commented 8 years ago

I ended up doing merge instead of a rebase since it gave me a cleaner diff on the PR. If you'd rather it be rebase I can do that instead.

jexp commented 8 years ago

I'd love a rebase, preferably into a single commit. Thanks a ton.

jacob-ewald commented 8 years ago

Sorry for the delay. Got the rebase and squashed into a single commit

jexp commented 8 years ago

merged manually, thanks a lot.