openzim / zimfarm

Farm operated by bots to grow and harvest new zim files
https://farm.openzim.org
GNU General Public License v3.0
84 stars 25 forks source link

Filter tasks by offliner #823

Open benoit74 opened 1 year ago

benoit74 commented 1 year ago

At least at the API level in the /tasks endpoint, being able to filter tasks by offliner (wikihow, zimit, ifixit, ...) would be help dev users analyzing the production behavior.

@Popolechien @RavanJAltaie do you think you could benefit from such a filter at the UI level in the https://farm.openzim.org/recipes page? Anywhere else?

@kelson42 @rgaudin your feedback is also welcomed πŸ˜„

rgaudin commented 1 year ago

It's been discussed (UI filters, not just API) and rejected in the past. I think the reasoning was that offliner is a technical detail and the scenarios in which we need it were bad ones. Maybe also because we already have enough metadata. That's why we use tags for this like the videos one.

I have no opinion.

Zimfarm development has always been driven by needs so maybe you can elaborate on your need with a relatable example. β€œdev users analyzing the production behavior” is a bit vague and I'm afraid this might be a monitoring need (which doesn't make it any less valid!)

Popolechien commented 1 year ago

Well since the creation of new recipes usually starts with the cloning of an existing one, if I understand the question correctly I'd say that yes, such a filter would come in handy. Of course a "Create recipe" button would also be nice, but not sure this is the place (as opposite to the CMS)

benoit74 commented 1 year ago

I have two recent situations in my mind:

rgaudin commented 1 year ago
* I need to disable all wikihow recipes because I know the offliner is experiencing an issue

We've been using toggle_scraper.py for that. It's a good opportunity to switch it to psql πŸ˜‰

* I'm looking after an openedx recipe to check if the offliner is working or not + find settings used in production

I can relate to that. Not common though. In an ideal world, Content would let you know if there are issues with a scraper πŸ˜‰. In this particular example, openedx is the sole contributor to /mooc so that tells you what to look for. You also have access to the DB πŸ˜…

rgaudin commented 6 months ago

Bumping this ; had to connect to shell to query the DB πŸ™ƒ

kelson42 commented 6 months ago

Late feedback, but "yes" I regularly miss this feature.

benoit74 commented 6 months ago

@kelson42 could you please put a prio label then, and do you wanna include it in Zimit2 project?

kelson42 commented 6 months ago

@benoit74 I believe we can live a few addional months without this feature, even if a bit annoying. That said, I believe this is quick to implement, so if you assess differently, fine to me.