richardwilly98 / elasticsearch-river-mongodb

MongoDB River Plugin for ElasticSearch
1.12k stars 215 forks source link

Help on "relationships" management #482

Open pierpaolocira opened 9 years ago

pierpaolocira commented 9 years ago

Hi, I can't figure how to manage "relationships" from MongoDB in River... I'm sorry: I've been searching for any kind of documentation without success.

My situation is very common: I need to index in ES two MongoDB collections (documents in the first collection have native MongoDB DBRef to documents of the other... in a "1-N" modeling like fashion). Using "default" mapping my DBRefs are treated as standard inner objects (like in MongoDB) so I have to "join" data by application business logic.

Instead, I would like to evaluate River/ES performances both by denormalization or by nested objects in ES.

I read that I have to use scripts for denormalization (to get data from the second collection), but I wasn't able to find any information about. I have to get related data directly by MongoDB? Or they should be in ES already? And, in which way? Or, better, River provide a way to automatically perform this task?

The same, I wasn't able to undestrand how to manage the problem by nested object way...

And... what's the River behaviour in a MongoDB document changes and it is in ES as inner/denormalized?

So my question is: can you make available to users some examples about this operations?

Sorry, I wasn't able to find any kind of documentation about.

Thanks

richardwilly98 commented 9 years ago

Do you look at the unit test?

I did a quick search in the repo:

https://github.com/richardwilly98/elasticsearch-river-mongodb/search?q=DBRef&type=Code&utf8=%E2%9C%93

pierpaolocira commented 9 years ago

Hi, thanks for your reply.

Whatching in the repository, I just see that DBRef(s) in documents are converted into maps (containing reference to other MongoDB documents as strings) before saving in ES.

But I'm interested to know if some kind of documentation exist about giving a semantic behaviour to DBRef:

  1. automatic DBRef denormalize (mapping the fields of referenced document as field of the current document) and how river-mongodb deal with update of object (stored in other document ES) from MongoDB side
    • if not, manual DBRef denormalize (by scripts?)
    • automatic DBRef inner object (explicitly mapping the referenced document as a field of the current document) and how river-mongodb deal with update of object (stored as inner in ES) from MongoDB side
    • if not, manual DBRef denormalize (by scripts?)

To provide an example to clarify, imagine to have in MongoDB: First doc in collection A: {name:"a1", b:DBRef("B", 111)} Second doc in collection A: {name:"a2", b:DBRef("B", 111)} One doc in collection B: {_id:111, city:"z"}

In first case I'm speaking to index in ES {name:"a1", city:"z"} {name:"a2", city:"z"}

In the second I'm speaking to index in ES {name:"a1", b: {city:"z"}} {name:"a2", b: {city:"z"}}

There is documentation about achieving these results just by river-mongodb configuration, or by other methods? What happen if in MongoDB the document {_id:111, city:"z"} changes? The modification is propagated?

Thank you again