Enhance assignment of tasks to workers - Githubissues

openzim / zimfarm

Farm operated by bots to grow and harvest new zim files

https://farm.openzim.org

GNU General Public License v3.0

84 stars 25 forks source link

Enhance assignment of tasks to workers #909

Open benoit74 opened 10 months ago

benoit74 commented 10 months ago

As of today, workers can select which scrapers they want to run.

But at a central level, we have no control over which scrapers are allowed to run which scrapers, and this could be an issue for various reasons:

we want the task to be executed close to the server / on "approved" machines (e.g. mwoffliner tasks on mwoffliner machines)
the task cannot succeed if not executed on given machines (typically due to IP whitelisting)

So, in addition to the platforms limitation, we probably also need to:

control on a central side which workers can run which scrapers (e.g. mwoffliner should run only on mwoffliner workers) ; maybe at the recipe level ; or with specific tags
control on a central side the affectation of some recipes to some worker(s) (e.g. for Zimit recipes of a Cloudflare protected website, we want the recipe to run on a single worker whose IP has been whitelisted)

kelson42 commented 8 months ago

It seems pretty important to umplement this ticket , for Wikimedia recipes (not all mwoffliner recipes) because only the mwoffliner workers have a privileged access to Wikimedia API (no quota).