jschnurr / scrapyscript

Run a Scrapy spider programmatically from a script or a Celery task - no project required.
MIT License
121 stars 26 forks source link

Example for Celery use #9

Closed anderser closed 3 years ago

anderser commented 3 years ago

Is there an example on how to use scrapyscript within a Celery task somewhere? Tried the blog, but it seems part II never came out :)

alvarolloret commented 3 years ago

Same here! How to use it for celery

etelpmoc commented 3 years ago

Same here. Also, I get an error message (windows64) Traceback (most recent call last): File "", line 1, in File "C:\Users\USER\AppData\Local\Programs\Python\Python38-32\lib\site-packages\billiard\spawn.py", line 165, in spawn_main exitcode = _main(fd) File "C:\Users\USER\AppData\Local\Programs\Python\Python38-32\lib\site-packages\billiard\spawn.py", line 207, in _main self = pickle.load(from_parent)

I wonder why this is happening

Matthijz98 commented 3 years ago

I got it working

@shared_task()
def updatePrices():
    webshops = Webshop.objects.all()

    for webshop in webshops:
        products = Product.objects.all().filter(from_webshop=webshop.id)
        for product in products:
            updateprice.delay(product.id, webshop.id)
    pass

and my spider is:

class ProductSpider(Spider):
    name = 'price_check_spider'

    def start_requests(self):
        yield Request(self.url)

    def parse(self, response):

        # Get price
        if self.price_selector_type == 'CSS':
            price = str(response.css(self.price_selector_query).get())
        if self.price_selector_type == 'XPATH':
            price = str(response.xpath(self.price_selector_query).get())

        # Get title
        if self.title_selector_type == 'CSS':
            title = str(response.css(self.title_selector_query).get())
        if self.title_selector_type == 'XPATH':
            title = str(response.xpath(self.title_selector_query).get())

        return {'price': price,
                'title': str(title)}
jschnurr commented 3 years ago

Here is an example. I will provide it in the documentation in an upcoming release.

import scrapy
from celery import Celery
from scrapy.spiders import Spider
from scrapyscript import Job, Processor

class MySpider(Spider):
    name = "myspider"

    def start_requests(self):
        yield scrapy.Request(self.url)

    def parse(self, response):
        page_title = response.xpath("//title/text()").extract_first()
        return {"data": page_title}

# Depends on localhost running rabbitmq-server and `poetry run celery -A tasks worker`
app = Celery("tasks", backend="rpc://", broker="pyamqp://guest@localhost//")

@app.task
def celery_job(url):
    job = Job(MySpider, url=url)
    return Processor().run(job)

if __name__ == "__main__":
    task = celery_job.s("https://www.python.org").delay()
    result = task.get()
    print(result)