scrapinghub / scrapyrt

HTTP API for Scrapy spiders
BSD 3-Clause "New" or "Revised" License
824 stars 161 forks source link

How can I pass the spider arguments in the scrapyrt API request? #115

Closed songm28 closed 3 years ago

songm28 commented 3 years ago

Problem: I can run the spider with scrapyrt without arguments, but it seems that I can not specify the spider arguments in the scrapyrt API request. Expect result: the spider with arguments can be triggered from the scrapyrt API.

e.g. I need to run my spider with some arguments:

scrapy crawl content_crawler -a start_urls="http://www.ionispharma.com/" -a allow="ionispharma.com" -a content_root_css="site-content"

but when i try to specify the scrapyrt API request like this, it did not work and give the code stracktrace error: Request:

{
    "request":{
        "url":"http://www.ionispharma.com/",
        "cb_kwargs":{
            "allow":"ionispharma.com",
            "start_urls":"http://www.ionispharma.com/"
        }
    },
    "spider_name":"content_crawler"
}

Response with error:

{
    "status": "ok",
    "items": [],
    "items_dropped": [],
    "stats": {
        "downloader/request_bytes": 438,
        "downloader/request_count": 2,
        "downloader/request_method_count/GET": 2,
        "downloader/response_bytes": 14570,
        "downloader/response_count": 2,
        "downloader/response_status_count/200": 1,
        "downloader/response_status_count/301": 1,
        "elapsed_time_seconds": 1.934496,
        "finish_reason": "finished",
        "finish_time": "2020-11-13 10:29:29",
        "log_count/DEBUG": 2,
        "log_count/ERROR": 1,
        "log_count/INFO": 9,
        "response_received_count": 1,
        "scheduler/dequeued": 2,
        "scheduler/dequeued/memory": 2,
        "scheduler/enqueued": 2,
        "scheduler/enqueued/memory": 2,
        "spider_exceptions/TypeError": 1,
        "start_time": "2020-11-13 10:29:27"
    },
    "spider_name": "content_crawler",
    "errors": [
        "Traceback (most recent call last):\n  File \"c:\\program files\\python37\\lib\\site-packages\\twisted\\internet\\base.py\", line 1292, in mainLoop\n    self.runUntilCurrent()\n  File \"c:\\program files\\python37\\lib\\site-packages\\twisted\\internet\\base.py\", line 913, in runUntilCurrent\n    call.func(*call.args, **call.kw)\n  File \"c:\\program files\\python37\\lib\\site-packages\\twisted\\internet\\defer.py\", line 460, in callback\n    self._startRunCallbacks(result)\n  File \"c:\\program files\\python37\\lib\\site-packages\\twisted\\internet\\defer.py\", line 568, in _startRunCallbacks\n    self._runCallbacks()\n--- <exception caught here> ---\n  File \"c:\\program files\\python37\\lib\\site-packages\\twisted\\internet\\defer.py\", line 654, in _runCallbacks\n    current.result = callback(current.result, *args, **kw)\nbuiltins.TypeError: parse() got an unexpected keyword argument 'allow'\n"
    ]
}
pawelmhm commented 3 years ago

This is not supported, there is some work on it need to finish this. For now closing. Please follow related ticket here:

https://github.com/scrapinghub/scrapyrt/issues/29