istresearch / scrapy-cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
http://scrapy-cluster.readthedocs.io/
MIT License
1.18k stars 323 forks source link

Rest - signal only works in main thread #204

Closed Shique closed 5 years ago

Shique commented 5 years ago

Rest API /feed endpoint throws the following error when called with a request to schedule a crawl.

Traceback (most recent call last):
    File "rest_service.py", line 56,
        in wrapper result = f(*args, **kw)
    File "rest_service.py", line 613,
        in feed result = self._feed_to_kafka(json_item)
    File "rest_service.py", line 566,
        in _feed_to_kafka return _feed(json_item)
    File "/usr/local/lib/python3.6/site-packages/scutils/method_timer.py", line 43,
        in f2 old_handler = signal.signal(signal.SIGALRM, timeout_handler)
    File "/usr/local/lib/python3.6/signal.py", line 47,
        in signal handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
ValueError: signal only works in main thread

I am using the latest dev branch with no alterations running inside Docker.

The problem seems to stem from the following line in rest/rest_service.py. Removing it eliminates the problem, but I don't know what other effects it has or why it's there for.

@MethodTimer.timeout(self.settings['KAFKA_FEED_TIMEOUT'], False)

Is this a problem from the crawler or maybe something to do with Docker?

madisonb commented 5 years ago

Fairly certain this was fixed with #201. If you look at https://github.com/istresearch/scrapy-cluster/blob/dev/rest/rest_service.py there is no mention of the MethodTimer decorator anymore.

Can you double check the docker container is up to date by doing a pull? otherwise I will check if the containers we unable to be pushed up to dockerhub on the build when that was merged in.

Shique commented 5 years ago

I suspected that may have been it. The image I have cached has not been pulled.