openzim / python-scraperlib

Collection of Python code to re-use across Python-based scrapers
GNU General Public License v3.0
20 stars 18 forks source link

Video related primitives should be provided in scraperlist #194

Open kelson42 opened 1 month ago

kelson42 commented 1 month ago

We are publishing more and more ZIM files with videos using many different scrapers.

Do do that we mainly:

For the moment many of these pieces of video related functions are distributed in different places (in the scraper relying on them).

At least to me, this is:

I'm not prescriptive about the exact solution, but I believe we should try to consolidate this at one place.

benoit74 commented 1 month ago

For me this is a problem of dependencies management: which software is using which version of which library.

Would a dashboard of dependencies versions per scraper help? I'm not sure it is sufficient because one still needs to know that problem x has been fixed in version x.y of dependency nnn.

What makes it even harder is that we need a solution which can handle both Python (because video re-encoding is done in scraperlib so we want to track the scraperlib version per scraper) and JS (because display is done with video.js which is ... JS).

Would it be me, I would propose a very radical solution, because the problems you describe are typically the strong argument for a mono-repo of all scrapers : all scrapers are at the same level of development, fixes have to be requested and tracked only once, releases are synchronized. Unfortunately it comes with its own share of drawbacks.

kelson42 commented 2 weeks ago

@benoit74 monorepo is a nogo to me. For the rest, I'm very open. If not technical solution can be found, we should at least have a procedural approach.

benoit74 commented 2 weeks ago

Since monorepo is a nogo, there is nothing but procedural approach / tooling to solve the problem you're describing, because since you would like to have an overview of the situation, we will always need to have a kind of dashboard allowing to:

This is the solution to be able to quickly say that something like "issue xxx is fixed by updating dependency xyz to version x.y.z, this version has been deployed in scraper aaa version x.y.z and scraper bbb y.z.x, not yet in other scrapers, and we have xxx ZIMs using version aaa or newer, but zzz ZIM still using older versions".

To me this is not a small thing to build / deploy because we need tooling for that. We need to find funding to develop/configure this tooling / procedures.

Without that funding / tooling, we are back to square one, doing all this manually when needs arise.