neo4j-contrib / neo4j-elasticsearch

Neo4j ElasticSearch Integration
Apache License 2.0
211 stars 79 forks source link

index_spec limit the power of what is sync #10

Open marcusleandro opened 9 years ago

marcusleandro commented 9 years ago

Hi, First of all congratulation for the helpful code. It works pretty well considering what it was intended to do, but I would like to suggest an enhancement in the elasticsearch.index_spec.

For example, if I want to sync NEO4J nodes that are labeled as Label1 and Label2 storing properties prop1, prop2 of nodes labeled as Label1 and prop3 of nodes labeled as Label2 into ES I need to set the index_spec as following:

       elasticsearch.index_spec=my_index:Label1(prop1,prop2), my_other_index:Label2(prop3)

Now, imagine that I would like to store into ES all properties of nodes labeled as Label1. I would like to set the index_spec like this:

       elasticsearch.index_spec=my_index:Label1(*), my_other_index:Label2(prop3)

Because a specific node type labeled as Label1 can have different properties from each other, for instance, two nodes with the same type, i.e. Label, can have both prop1 and prop2, but the other can have both prop2 and prop4. I would like to store all this properties of each node into ES without having to specify each one in the index_spec, because it can change from node to node even if they have the same Label. I believe this enhancement it would be a great gain for all.

Another point: I would like to specify in the index_spec to sync all nodes regardless of their respective labels and also all properties into ES. So the new index_spec would be something like this:

        elasticsearch.index_spec=my_index:*(*)

The first * would tell that we will sync all node types and the second would tell that we would put into ES all properties of the current node to be synced.

jexp commented 9 years ago

Sounds like a good idea. Would you feel up to send a pull request with an updated implementation?

jazzido commented 9 years ago

@marcusleandro, if you get around to contributing a PR to this, I would also love to merge it to my fork of neo4j-elasticsearch.

BTW, that fork is what I'm using on a production system since last month so you might want to take a look at it.

rremigius commented 9 years ago

Hi, I was looking for a feature to index all nodes in the database without specification (the my_index:*(*) feature), so I implemented one in a fork: https://github.com/rremigius/neo4j-elasticsearch The configuration is a little bit differently than suggested, i.e., one general index can be specified in a separate config property:

elasticsearch.index_all=my_general_index

This will index all nodes in my_general_index in elasticsearch. The index_spec property is still required for the plugin to work.

Note: in this fork, I also changed to node indexing structure to:

{
    id: ...,
    labels: [...],
    properties: {...}
}

This is to prevent conflicts between properties called id or labels and the actual id and labels of the node.

jexp commented 9 years ago

I really like @marcusleandro's suggestions. And would love to see a PR.

@rremigius I thought about that, but then didn't separate out the properties. Perhaps it could be an config option how to write the properties to ES? @jazzido what do you think?

jazzido commented 9 years ago

@jexp, I think what @rremigius's implemented is definitely a useful feature. I would also like to see a PR :)