uber / uReplicator

Improvement of Apache Kafka Mirrormaker
Apache License 2.0
917 stars 198 forks source link

Example 1 vs 2 vs normal startup scripts vs topic mapping clarification #154

Open daluu opened 6 years ago

daluu commented 6 years ago

I don't recall it mentioned in quickstart/README, but looking through the shell script of the example 1 worker, it appears the topicmapping.properties file is supplied whereas it is not in example 2 worker. I interpret this to mean that to remap topic names during mirroring, you must explicitly whitelist? Since there's no common regex way to generically say for source topic X, rename to destination topic changing some prefix/suffix, etc.

Or am I mistaken in that example 2 can also use topicmapping, and whatever is not in topic map simply gets mirrored without topic name change?

Also want to clarify operation of example 1. Since we provide the topicmapping, does that mean topics in the topic map are already whitelisted at startup? Or does the user still have to make the REST call to whitelist the topic that's already defined in topicmapping? If the latter, to me it would seem more efficient & intuitive to auto-whitelist the topics in the topic map on startup than to make the user do redundant steps. It makes sense to require that if they didn't supply a topicmapping file.

And also suggest that for example 1 approach, since we use REST to whitelist a topic, is there a similar REST API to set a topic mapping for a white listed topic? Or instead whitelist a topic & topic (re)map it at the same time in a single REST request. If these don't exist, I don't see why we shouldn't add them to future roadmap.

Also, looking over the example shell scripts vs the normal start controller & worker scripts (the non-example ones), they look pretty similar with exception of:

Have some questions about that above:

xhl1988 commented 6 years ago
  1. Currently example 2 doesn't work with topicmapping.properties. Controller side code changes are needed to work with topicmapping.properties in auto-whitelist mode. If that works, you can enable auto-whitelisting which don't require REST api. The reason it's not there yet is we don't have auto-whitelisting+topic-mapping use cases now. But we can do it in the future.
  2. Example is not for demo purpose and we expect it to be as simple as possible. But I agree the documentation need to be improved. For you question, yes, currently topicmapping only works in explicit whitelisting now.
daluu commented 6 years ago

Thank you for the clarification.

For you question, yes, currently topicmapping only works in explicit whitelisting now.

A follow up question: does this mean that although topic is listed in topicmapping.properties, one still has to explicitly whitelist topic via REST API call for it to replicate it with new topic name? e.g. no auto-whitelisting from reading topicmapping at startup, topicmapping useless at startup if one doesn't whitelist any topic to start with.

xhl1988 commented 6 years ago

Yes. you have to manually whitelist the topic.

dungnt081191 commented 5 years ago

Hi @xhl1988 , currently when the topic has been replicated from Source Kafka to Destination Kafka, but by some reason , i deleted topic which replicated in Destination Kafka , and uReplicator DID NOT replicate the message from beginning ( like smallest mode i set ). It's just replicate the latest(the new) message from Source to Destination at this time.

How can replicate again the topic from beginning ( all message replicate from Source to Destination ) Where the message in topic store in uReplicator ?

xhl1988 commented 5 years ago

You can set this in the consumer config, i.e. consumer.properties with auto.offset.reset=smallest. It is same as regular Kafka consumer.

xhl1988 commented 5 years ago

uReplicator doesn't store messages, all the messages are in memory and discarded after sending to dst cluster.