scrapinghub / frontera

A scalable frontier for web crawlers
BSD 3-Clause "New" or "Revised" License
1.29k stars 215 forks source link

add assertion error description for easier debugging #389

Closed a-shkarupin closed 4 years ago

a-shkarupin commented 4 years ago

In case SPIDER_FEED_PARTITIONS setting is set to a value that doesn't match the number of partitions of the spider feed topic in kafka, the following error is reported on the crawler side:

2019-11-29 11:03:22 [kafka.producer.kafka] INFO: Closing the Kafka producer with 0 secs timeout. 2019-11-29 11:03:22 [kafka.producer.kafka] INFO: Proceeding to force close the producer since pending requests could not be completed within timeout 0. 2019-11-29 11:03:22 [kafka.producer.sender] DEBUG: Beginning shutdown of Kafka producer I/O thread, sending remaining records. 2019-11-29 11:03:22 [kafka.conn] INFO: <BrokerConnection node_id=bootstrap-0 host=localhost:9092 [IPv4 ('127.0.0.1', 9092)]>: Closing connection. 2019-11-29 11:03:22 [kafka.producer.sender] DEBUG: Shutdown of Kafka producer I/O thread has completed. 2019-11-29 11:03:22 [kafka.producer.kafka] DEBUG: The Kafka producer has closed. 2019-11-29 11:03:22 [kafka.producer.kafka] INFO: Closing the Kafka producer with 0 secs timeout. 2019-11-29 11:03:22 [kafka.producer.kafka] INFO: Proceeding to force close the producer since pending requests could not be completed within timeout 0. 2019-11-29 11:03:22 [kafka.producer.sender] DEBUG: Beginning shutdown of Kafka producer I/O thread, sending remaining records. 2019-11-29 11:03:22 [kafka.conn] INFO: <BrokerConnection node_id=bootstrap-5 host=localhost:9092 [IPv4 ('127.0.0.1', 9092)]>: Closing connection. 2019-11-29 11:03:22 [kafka.producer.sender] DEBUG: Shutdown of Kafka producer I/O thread has completed. 2019-11-29 11:03:22 [kafka.producer.kafka] DEBUG: The Kafka producer has closed. Unhandled error in Deferred: 2019-11-29 11:03:22 [twisted] CRITICAL: Unhandled error in Deferred:

Traceback (most recent call last): File "/home/a/venvs/frontera/lib/python3.6/site-packages/scrapy/crawler.py", line 184, in crawl return self._crawl(crawler, *args, *kwargs) File "/home/a/venvs/frontera/lib/python3.6/site-packages/scrapy/crawler.py", line 188, in _crawl d = crawler.crawl(args, **kwargs) File "/home/a/venvs/frontera/lib/python3.6/site-packages/twisted/internet/defer.py", line 1613, in unwindGenerator return _cancellableInlineCallbacks(gen) File "/home/a/venvs/frontera/lib/python3.6/site-packages/twisted/internet/defer.py", line 1529, in _cancellableInlineCallbacks _inlineCallbacks(None, g, status) --- --- File "/home/a/venvs/frontera/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks result = g.send(result) File "/home/a/venvs/frontera/lib/python3.6/site-packages/scrapy/crawler.py", line 88, in crawl yield self.engine.open_spider(self.spider, start_requests) builtins.AssertionError:

2019-11-29 11:03:22 [twisted] CRITICAL: Traceback (most recent call last): File "/home/a/venvs/frontera/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks result = g.send(result) File "/home/a/venvs/frontera/lib/python3.6/site-packages/scrapy/crawler.py", line 88, in crawl yield self.engine.open_spider(self.spider, start_requests) AssertionError

It was hard to understand what the cause was. With suggested patch the assertion description is provided in the error, thus making it easier to understand:

2019-11-29 11:04:21 [twisted] CRITICAL: Traceback (most recent call last): File "/home/a/venvs/frontera/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks result = g.send(result) File "/home/a/venvs/frontera/lib/python3.6/site-packages/scrapy/crawler.py", line 88, in crawl yield self.engine.open_spider(self.spider, start_requests) AssertionError: Number of kafka partitions doesn't match config for spider feed

sibiryakov commented 4 years ago

thank you!