scrapinghub / frontera

A scalable frontier for web crawlers
BSD 3-Clause "New" or "Revised" License
1.29k stars 215 forks source link

Does Apache Kafka need Apache Zookeeper to run Frontera? #370

Closed ghost closed 5 years ago

ghost commented 5 years ago

Because I want to install apache kafka, I realize that it depends on apache zookeeper. However, it is not specified in the documentation. Does Apache Kafka need Apache Zookeeper to run Frontera?

if possible would you like to share me how to install apache kafka in Ubuntu 18.04, as i just migrate from Windows to Ubuntu just for using frontera. Hence, im very new to linux.

In addition, I would like to ask what standard modules or files need to be created to run frontera with apache kafka and apache hbase. In fact, I already know which files and modules should be configured in the Document Cluster Setup Guide, but I want to know the naming and arrangement of it with scrapy.

As far as I know, scrapy can use the scrapy startproject tutorial command to automatically generate the standard files and modules needed for scrapy.

Thanks in advance!

sibiryakov commented 5 years ago

Hi, ZK is mandatory requirement for Kafka, and Frontera is not using it directly. The easiest way to setup Kafka is to use Docker image or binary bundle from Confluent. Scrapy requires Frontera to be installed, and its using middleware and scheduler implementations from Fronteta.