Open claustres opened 6 years ago
A big point is to choose if sequencing occurs at the job or the task level.
At job level we will have something more serverless-oriented, where each job might be a complete new krawler instance that can be embedded in a lambda. In this case we should at least allow tasks to be multithreaded.
Started a PoC using kue just as a new job type without multithreading/cluster for now. The main issue we face with clustering is how we share stores between workers because they are created when running the job. However the job should only be run by a single worker to avoid duplication while tasks are dispatched across workers.
Since Redis support under windows by https://github.com/MicrosoftArchive/redis has been discontinued we use https://github.com/tporadowski/redis.
The issue with stores also exists in single-thread mode when job passes the store to task using the store
property. Indeed task data are serialized into Redis by Kue causing the store to be lost, e.g. the CLI test does not work with Kue.
Kue will add support for failover and concurrency.
worker-farm might also be used as it is more simple and does not require a side tool like redis.
agenda looks also great and will allow job scheduling.