yougov / mongo-connector

MongoDB data stream pipeline tools by YouGov (adopted from MongoDB)
Apache License 2.0
1.88k stars 478 forks source link

Mongo-connector with ES and MongoDB Cluster #708

Open supermafete opened 7 years ago

supermafete commented 7 years ago

Hi there. I have a question regarding mongo-connector deployment on a cluster environment.

Currently, I have 3 nodes configured with a MongoDB replicaSet and with a ES cluster. I run mongo-connector to create the ES indexes of the MongoDB databases, and everything works well. I have several targetURLs in the docManagers config for mongo-connector regarding the three nodes.

Anyway I'm wondering what happens if I run mongo-connector with this configuration in all three nodes. Let's say I'm running mongo-connector just in node1. In case node1 shuts down, then ES won't index new data from node2 and node3. So my goal is to keep syncing data despite one node failure.

The obvious answer is to run mongo-connector from all three nodes, but I'm afraid about 3 mongo-connectors behavior syncing all nodes at once. Probably it will case some unuseless traffic or, worst, some information collisions or duplications.

Has someone work with a configuration like this? Any advice of how to do it?

Thank you in advance.

M.

sjtuzl commented 7 years ago

I have exact same question. How the connector works in a HA environment to allow fail over.

supermafete commented 7 years ago

I have isolated mongo-connector in a separate host. Running my tests I realized that it's not a good idea to run mongo-connector in the three nodes because of data rebounds. That host is replicated, so if it downs then the backup mongo-connector host starts.

kgrvamsi commented 6 years ago

How about running the three nodes behind a load balancer and give that url to mongo-connector (Assuming that the mongodb on 3 node cluster data is always replicated and data is same across all the three nodes)

supermafete commented 6 years ago

@kgrvamsi that's almost the same that I have now. I run mongo-connector from another host in the same network segment, pointing to one node. The load balancer solution seems a good idea, but does not answer to mongo-connector behaviour in a HA mongodb environment.

I think MongoDB is designed for not to use a load balancer. Anyway, I wouldn't know how to put a load balancer in front of MongoDB replicaset that supports mongodb:// protocol.