Closed johndpjr closed 10 months ago
Celery also requires a place in the backend to store the result of a task. A Celery application on its own cannot store/provide the return of any given function. More precisely, Celery reports where in a backend it has stored a result.
To implement a basic Celery app into a module:
from celery import Celery
app = Celery(name, broker='broker url', backend='backend url')
Then, any function that is to be run asynchronously is decorated with @app.task
. This allows the function to be sent to a worker using function.delay(args)
where the arguments are sent to the original function.
A Celery worker can be run using celery -A filename worker --loglevel=info
where filename is a module with a celery app. In production this would need to be daemonized.
It seems as though implementing Celery to the existing code will be relatively simple. A Celery app can be created for the necessary scraping functions and those functions are sent to queue with the arguments for any given website.
Closed with PR 154.
Context
Right now, scraping jobs are run manually. We can automate this by using Celery, a distributed task queue framework that allows us to manage and execute tasks asynchronously (a task being some function). For our use, Celery will automate webscraper and data processing jobs so that we can manage multiple webscrapers at once with ease. Eventually, jobs will run on their own accord at specified times. This greatly speeds up the web-scraping and data analysis portion of AgTern and removes the need for manual job starts.
TODO
requirements.txt
file) and install it by running thepip3 install -r requirements.txt
againNotes
This story does not require you to actually run any jobs (i.e. just install Celery, don't run any tasks right now). Using Celery to run jobs will be more complex and is out of the scope of this story.