istresearch / scrapy-cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
http://scrapy-cluster.readthedocs.io/
MIT License
1.18k stars 323 forks source link

Lost config from Zookeeper makes spider down #203

Closed jamesliu668 closed 5 years ago

jamesliu668 commented 5 years ago

Hello,

I have deployed my scrapy-cluster on cloud platform with 3 kafka/zookeeper, 1 redis, and 1 spider machine. All hosts can communicate with each other in the local network. The spider machine can access the internet. Currently I am suffering from this error "Lost config from Zookeeper". But I double check my kafka/zookeeper hosts, all are online. After I kill the spider and restart it, it can work properly. But after several days, it will get this error and stop again.

Here is the full error msg: ` 2018-10-06 23:09:35,132 [sc-crawler] INFO: Lost config from Zookeeper

Exception in thread Thread-4:

Traceback (most recent call last):

File "/usr/local/python2.7/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run()

File "/usr/local/python2.7/lib/python2.7/threading.py", line 754, in run self.target(*self.args, **self.__kwargs)

File "/usr/local/python2.7/lib/python2.7/site-packages/kazoo/protocol/connection.py", line 471, in zk_loop if retry(self._connect_loop, retry) is STOP_CONNECTING:

File "/usr/local/python2.7/lib/python2.7/site-packages/kazoo/retry.py", line 123, in call return func(*args, **kwargs)

File "/usr/local/python2.7/lib/python2.7/site-packages/kazoo/protocol/connection.py", line 488, in _connect_loop status = self._connect_attempt(host, port, retry)

File "/usr/local/python2.7/lib/python2.7/site-packages/kazoo/protocol/connection.py", line 529, in _connect_attempt [], [], timeout)[0]

File "/usr/local/python2.7/lib/python2.7/site-packages/kazoo/handlers/threading.py", line 147, in select return select.select(*args, **kwargs)

TypeError: argument must be an int, or have a fileno() method.`

I am under python 2.7.

Thanks

madisonb commented 5 years ago

Given that the traceback looks like it is coming from within Kazoo, I am not sure I can help you debug unless you can give me steps to reproduce it within the project. If you can guide me in generating this error with the docker containers or on the vagrant machine, that would be helpful.

Otherwise, given how dormant this issue has been, I am going to close it

jamesliu668 commented 5 years ago

It's a little bit weird as when I start the spider with "scrapy runspider xx.py". It's working properly at the beginning. But somehow this error message will appear after several days running.

madisonb commented 5 years ago

Closing due to inactivity and no steps to reproduce