openzim / zimfarm

Farm operated by bots to grow and harvest new zim files
https://farm.openzim.org
GNU General Public License v3.0
82 stars 25 forks source link

Add concept of scraper release channels #957

Open benoit74 opened 4 months ago

benoit74 commented 4 months ago

Currently, in a recipe we configure which scraper image and which scraper tag we want to use.

This has the side-effect that anytime a new scraper tag (version) is released, we need to update all recipes images tags to use the new tag / version.

This is currently a manual effort and quite brutal: we update all recipes using a given scraper image to use the last tag.

It also means that we cannot do it without have technical access to the DB, the risk of manipulation error is significant, the rollback is difficult and even more importantly we do not manage at all situations where some recipes have to be run with a different image tag. This does not happen a lot, but is very uncomfortable when it happens (like currently where few recipes are using the zimit2 tag while most of them should continue to use the "officially released" version).

I advise to introduce the concept of release channels (nota: this idea comes from yesterday demo of browsertrix).

A release channel will consist of:

Recipes would not be associated anymore with a scraper image name and scraper image tag but with the id of a release channel.

This means that anytime a new image tag is released, we will only have to update settings of few release channels.

Requested Task will store the release channel id that was configured in the recipe at request time.

When transforming the requested task into a task, we will decide which image name and tag will be used, and this information will be stored in the task configuration. This means that whenever a release channel is updated, any task which is started after this point in time will benefit from the new setting.

We need a specific database table to store these release channels.

We need a UI screen/section and new APIs for admins to:

UI needs to be updated to:

Open question: what do we wanna do when we clone a recipe? We probably needs something since this has proved to be a source of confusion in the past, i.e. an editor clones a zimit recipe which is unfortunately using the zimit2 tag without realizing it is not using the default (zimit1) tag. Do we wanna have a concept of default release channel per scraper, so that whenever we clone a recipe which is not on this default release channel, either we display a warning or we force it to use the default one?

rgaudin commented 4 months ago

I like the idea. As for the question, I still believe that we should expect minimal effort from User. If they dont recognize that Channel is Zimit 2 instead of Zimit 1, then we cannot expect anything they input to be correct.

That said cloning UI could be improved to prominently include the Channel (or task_name before)