Use Scrapyd with arguments

scrapy / scrapyd

A service daemon to run Scrapy spiders

BSD 3-Clause "New" or "Revised" License

2.98k stars 570 forks source link

import scrapy from scrapy.spiders import CrawlSpider, Rule from scrapy.linkextractors import LinkExtractor class MyItem(Item): url = Field() class HttpbinSpider(CrawlSpider): name = "expired" start_urls = [domain]

It's not possible due to a missing feature in scrapy. Users typically workaround this by serializing the arguments to curl and then unserializing in the spider's __init__().

E.g. curl http://myip:6800/schedule.json -d project=default -d spider=myspider -d domains='["www1.example.com", "www2.example.com"]'

class MySpider(Spider):
    def __init__(self, domains=None):
        domains = json.loads(domains)
        # do something with domains...

Another scenario would be to use the pickle module which can allow you to pass python objects to your spider but makes dumping the curl arguments more complicated.

Closing because it dublicates #61 and the issue tracker is not for support. Ask for help in the community if my answer didn't cover you.

scrapy / scrapyd

Use Scrapyd with arguments #133