istresearch / scrapy-cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
http://scrapy-cluster.readthedocs.io/
MIT License
1.18k stars 323 forks source link

Add Kafka Request Size Param to Crawler Producer #202

Closed madisonb closed 5 years ago

madisonb commented 5 years ago

Adds the ability to customize the producer config for the crawler, given that the size of web pages vary greatly and you may run across some that are too big to be allowed into Kafka.

This is controlled by two settings:

  1. The broker setting message.max.bytes for Kafka
  2. The producer setting max_request_size for the KafkaProducer
coveralls commented 5 years ago

Coverage Status

Coverage increased (+0.5%) to 71.314% when pulling 8840e08f1b122832fc23d269ff1bbb6aa2c63bc6 on request-size into 67b8cf0c27938c25cf9fce0ddd76517ef1ba05d8 on dev.