rwynn / route81

A bi-directional sync daemon for MongoDB and Kafka
MIT License
9 stars 6 forks source link

what's the logic for sync multiple dbs and related collections of mongo to kafka topic? #4

Open sunweiconfidence opened 5 years ago

sunweiconfidence commented 5 years ago

@rwynn what's the current logic for sync multiple dbs and corresponding collections of mongo to kafka topic? Can support multiple dbs' collections simultaneously concurrency write to kafka topic? if support simultaneously concurrency, multiple dbs' corresponding collections can simultaneously concurrency sync to one topic now? thanks

rwynn commented 5 years ago

@sunweiconfidence route81 will concurrently read multiple collections across multiple dbs in MongoDB and concurrently write to Kafka. However, there is currently no way to put messages across MongoDB collections into the same Kafka topic. Currently, each MongoDB collection e.g. db1.col1 will go into a topic with the same name (in this case db1.col1). You can optionally set the topic-name-prefix config and then that adds a common prefix to all topic names. But you would still have that string followed by the db and collection name in the topic, e.g myprefix.db1.col1, myprefix.db1.col2, myprefix.db2.col1.

So, to get all events in one consumer you would need to subscribe to multiple topics, one for each collection in MongoDB.

sunweiconfidence commented 5 years ago

@rwynn thanks, whether it is feasible for puting messages across MongoDB collections into the same Kafka topic in the future? or has some restrictions for implement it? btw, i want to know whether has complete config file template for using route81? in addition, concurrently read multiple collections across multiple dbs in MongoDB and concurrently write to Kafka by starting multiple route81 processes for using multiple config file or start one route81 process for one config file containing multiple collections and topics? thanks

rwynn commented 5 years ago

yes, I think it would be feasible to add a mapping between mongodb collection name and kafka topic name such that this mapping could point multiple collections to the same topic.

generally, I don't think you need more than 1 route81 process although you can of course. route81 by default opens up a change stream against the entire MongoDB deployment, so any change to any user collection will be sent to kafka. If you only want specific dbs or collections you can use the change-stream-namespaces string[] option in the config file.

# instead of listening to entire deployment, listen to all changes in db1 and the col2 collection of db2.
# changes in any collection sent to topic with name $db.$collection
change-stream-namespaces = ["db1", "db2.col2"]
sunweiconfidence commented 5 years ago

@rwynn thanks, i want to confirm that how to specify kafka topics for corresponding collections in toml config file when mongo sync to kafka? thank you.

rwynn commented 5 years ago

@sunweiconfidence just released a new version 1.2.0 that you can use to send multiple collections to a single topic. For details on how to configure see https://github.com/rwynn/route81#sending-multiple-mongodb-collections-to-the-same-kafka-topic.

sunweiconfidence commented 5 years ago

@rwynn thanks, for this program, what's the special restrictions for kafka,zookeeper and mongodb version? thank you