istresearch / scrapy-cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
http://scrapy-cluster.readthedocs.io/
MIT License
1.18k stars 323 forks source link

What is Kafka's role in scrapy-cluster? #172

Closed crisfan closed 6 years ago

crisfan commented 6 years ago

hi,madisonb! it's an amazing project,Thank you for your contribution.

Here's some questions that have been bothering me for a long time.

  1. Why build kafka between rest and kafka-monitor?What problem does it want to solve?
  2. If the data from rest output is small, can we deliver it directly to kafka-monitor?

After I looked up the information on kafka,I think it's just a buffer in this project, right? hope to receive your reply,thanks!

madisonb commented 6 years ago
  1. The Rest interface allows the kafka monitor to handle the cluster's API validation calls. Building the same code into the restful interface would mean repeating lots of code across both projects. See the docs here for more information
  2. I would take a look at the architecture diagram on this page to see how everything interacts. The restful component is meant to be consumed by something more UI friendly. The kafka monitor still has its place too.

Kafka is indeed used as the message bus throughout this project.

Given that these are general questions and not issues with the project, can I close this and we can move further conversation to Gitter? Thanks.

crisfan commented 6 years ago

Looking forward to talking to you on gitter!