jpmantuano / kafka-committer

A committer allowing a norconex crawl to publish crawled data to a Kafka topic.
1 stars 2 forks source link

Apache Kafka Committer

Apache Kafka implementation of Norconex Committer.

Configuration

When used with a Norconex Collector, you can use the following XML to configure Apache Kafka as the section of your Norconex Collector configuration:

<committer class="net.danizen.norconex.committer.kafka.KafkaCommitter">
  <brokerList>...</brokerList>
  <topicName>...</topicName>

  <sourceReferenceField keep="[false|true]">...</sourceReferenceField>
  <sourceContentField keep="[false|true]">...</sourceContentField>
  <targetContentField>...</targetContentField>
  <queueDir>...</queueDir>
  <queueSize>...</queueSize>
  <commitBatchSize>...</commitBatchSize>
  <maxRetries>...</maxRetries>
  <maxRetryWait>...</maxRetryWait>
</committer>

Tag Descriptions:

Tag Description
brokerList Comma delimited list of host URLs to connect to a Kafka Broker of Cluster
topicName Kafka Topic to where the committer publish messages

Installation

The Apache Kafka Committer is a library that you must include in another product classpath (along with required dependencies). For use with a Norconex Collector, the collector must already be installed on your system and is referred to as in the following instructions. You have the option to perform an automated installation (recommended), or a manual one.