istresearch / scrapy-cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
http://scrapy-cluster.readthedocs.io/
MIT License
1.18k stars 323 forks source link

Removed method timer wrapper in non-main thread #201

Closed madisonb closed 5 years ago

madisonb commented 5 years ago

Should allow us to catch errors more efficiently when we cannot connect to kafka for whatever reason. Was throwing the following log

rest_1 | {"message": "Uncaught Exception Thrown", "logger": "rest-service", "status": "FAILURE", "timestamp": "2018-09-24T15:16:49.882399Z", "data": null, "level": "ERROR", "error": {"message": "An error occurred while processing your request.", "cause": "", "exception": "signal only works in main thread", "ex": "Traceback (most recent call last):\n File \"rest_service.py\", line 59, in wrapper\n result = f(*args, **kw)\n File \"rest_service.py\", line 620, in feed\n result = self._feed_to_kafka(json_item)\n File \"rest_service.py\", line 573, in _feed_to_kafka\n return _feed(json_item)\n File \"/usr/local/lib/python2.7/site-packages/scutils/method_timer.py\", line 43, in f2\n old_handler = signal.signal(signal.SIGALRM, timeout_handler)\nValueError: signal only works in main thread\n"}}

To test:

  1. Build the rest service container
    $ docker-compose up -d --build rest
    # this should bring up kafka, zookeeper, and redis
  2. Send a message to kafka
    
    $ curl localhost:5343/feed -H "Content-type:application/json" -d '{"appid":"testapp", "uuid":"blahblah1", "stats":"kafka-monitor"}'

response

{"data":{"poll_id":"blahblah1"},"error":null,"status":"SUCCESS"}

coveralls commented 5 years ago

Coverage Status

Coverage increased (+0.006%) to 70.858% when pulling 585b52bf31e032f57090810015fad6f53f9cf998 on thread-signals into be82c7fb6cf6994eebc12b7a446f2b2a6453021a on dev.