jschnurr / scrapyscript

Run a Scrapy spider programmatically from a script or a Celery task - no project required.
MIT License
121 stars 26 forks source link

set CrawlSpider settings from Job() #10

Open Matthijz98 opened 3 years ago

Matthijz98 commented 3 years ago

Hi,

I am trying to use a CrawlSpider with Celery. The settings for the CrawlSpider are stored in the database. But what i try i can not get the settings passed to the CrawlSpider class.

My spider

class FindProductSpider(CrawlSpider):
    name = 'FindProductSpider'
    allowed_domains = ['']
    start_urls = ['']
    webshopid = ''
    rule = ''

    rules = [Rule(LinkExtractor(allow=rule), callback='parse_item', follow=True)]

    def parse_item(self, response):
        p = Product(url=response.url, from_webshop_id=self.webshopid)
        p.save()

and my celery task:

@shared_task()
def getproducts():
    webshops = Webshop.objects.all()

    for webshop in webshops:
        job = Job(FindProductSpider,
                  start_urls=[webshop.spider_start_url],
                  allowed_domain=[webshop.spider_allowed_domain],
                  rule=webshop.spider_allow_regex,
                  webshopid=webshop.id
                  )
        processor = Processor(settings=settings)
        data = processor.run([job])

But when i print the settings they keep being empty. Some help would be super nice