istresearch / scrapy-cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
http://scrapy-cluster.readthedocs.io/
MIT License
1.17k stars 323 forks source link

Incremented fail stats #254

Closed mingxuan1 closed 3 years ago

mingxuan1 commented 3 years ago

os: ubuntu18.0.4 release: scrapy-cluster-dev python: 3.6

I want to fed an item to kafka and add two filelds "client_id", "user_id" Like this: { "url": "https://www.google.com", "appid": "testapp", "crawlid": "abc13", "maxdepth": 1, "client_id": "nice", "user_id": "gfdf4654646464" } 2021-02-26 17:02:01,104 [kafka-monitor] INFO: Successfully fed item to Kafka

It works

But there is a problem in kafka_monitor, here is the logs:

2021-02-26 17:02:01,104 [kafka-monitor] DEBUG: Incremented total stats 2021-02-26 17:02:01,109 [kafka-monitor] WARNING: Did not find schema to validate request 2021-02-26 17:02:01,110 [kafka-monitor] DEBUG: Incremented fail stats 2021-02-26 17:03:00,033 [kafka-monitor] DEBUG: Compiling total/fail dump stats 2021-02-26 17:03:00,034 [kafka-monitor] DEBUG: Compiling plugin dump stats 2021-02-26 17:03:00,036 [kafka-monitor] INFO: Kafka Monitor Stats Dump:

And spider can't get url, spider did't work