Closed kelson42 closed 5 months ago
@Popolechien @RavanJAltaie
What do you want to do with these recipes which have the playlist mode enabled?
aimhi_playlists
dse_ladakh_lbj_playlists
keylearning_en
khan-videos_ar_playlists
khan-videos_bn_playlists
khan-videos_en_playlists
khan-videos_es_playlists
khan-videos_fr_playlists
khan-videos_tr_playlists
madrasa_ar_playlists
project-fuel
ruangguru_id_playlists
scienceinthebath_playlist
slam-out-loud_hi
tutorial-wikipedia
ubongo_sw
voa_learning_english_all_playlists
zenius_id_playlists
@benoit74 Let us go over them later today and revert back.
Add Canadian prepper
to the list of zim files that use multiple playlists and need updating, but basically we'll recreate them all with one playlist per recipe / zim file (except Khan, zenius and ruangguru).
Canadian prepper
has no problem, it is creating one ZIM from a bunch of playlists. What we want to deactivate/remove is when you want to automatically create one ZIM per playlist in the channel / user. Except if it is indeed badly configured and what you wanted is multiple ZIMs, but that's another story.
How should we move forward on this? Should we wait for you to recreate all recipes, or can we simply delete them and you will recreate them on the fly (deleting the recipe will not remove the ZIMs anyway)? Do you need a configuration export so that it will be easier to recreate?
Do you plan to create all recipes manually?
What do you mean by "except Khan, zenus and ruangguru" ? Do you plan to simply delete these recipes and not create anymore ZIMs for these ones?
What we want to deactivate/remove is when you want to automatically create one ZIM per playlist in the channel / user.
Ok this I had misunderstood. If there is a way to easily export/duplicate the recipes then by all means let's do it, but otherwise we'll need to recreate them manually (and then only delete the original ones).
Khan, zenius and ruangguru will be deleted entirely (recipes AND zim files).
OK, thank you.
Export is easy, it will just be a "raw" copy of the configuration, just so that you have a reference of the old configuration. E.g. for aimhi_playlists (I redacted the secrets), I will export the configuration below and then delete the recipe:
{
"api-key": "**********",
"concurrency": 1,
"debug": true,
"format": "webm",
"id": "PLr5n3ojAJWjSVnG_EK1xF3rW1Lo0N2qwA,PLr5n3ojAJWjQEiRIuHlRoN7rBDKG6GbvH,PLr5n3ojAJWjRGQ1DnnIqDrIXKuuNydGid,PLr5n3ojAJWjTkqDW49ew1u7vsbVzM5Uub,PLr5n3ojAJWjSVh9mgusLb6npGNnFif_Sw,PLr5n3ojAJWjRSuu4s5Vu1CEN0rkgXZ-VA,PLr5n3ojAJWjRDmRmIVAsD4MSEt7wMSOGr,PLr5n3ojAJWjSNVp6jrlwXPz5MyFArv6oO,PLr5n3ojAJWjRiuPrUAAveWrrqnNPPNzYK",
"indiv-playlists": true,
"language": "eng",
"low-quality": true,
"main-color": "#FFFFFF",
"optimization-cache": "https://s3.us-west-1.wasabisys.com/?keyId=*****&secretAccessKey=*****&bucketName=org-kiwix-youtube",
"output": "/output",
"playlists-description": "The nature-first, curiosity-powered online school for ages 8-18",
"playlists-name": "aimhi_en_-{title}",
"playlists-title": "AimHi",
"playlists-zim-file": "aimhi_en_{slug}_{period}",
"tags": "aimhi",
"tmp-dir": "/output",
"type": "playlist"
}
Is this useful?
Duplicate is something you already have with the "Clone" button. But you still have to input everything else.
From my PoV, this last remark emphasis that:
I don't know if we should live with it, try some quick and dirty wins on some of these topics, or implement a real solution.
It is probably for @RavanJAltaie to decide how she wants to proceed, but I'm not sure the export is really useful as neither her nor myself have the skills to create the new recipes via script. Our last discussion was to clone existing recipes (in which case (I'd delete them after the deed is done).
As for next steps, the quick and dirty tends to be somewhat permanent in this house and not exactly convenient for the non-dev end user either: I suggest we park it until this becomes a real project.
OK, so next steps before I can start to work on this issue are:
Correct? Note that I'm not speaking about the deletion of unwanted ZIMs, since there is no dependency AFAIK and we can do it at any time, at your own convenience
I'm waiting for your GO to perform the last step which consist in removing the ability to use the "playlists mode" in Zimfarm
I realize that @RavanJAltaie was not on the thread and missed that part. I've assigned her now so she can confirm to you when all new recipes have been created
Now I'm confused, recipes with multiple playlists are ok? the only problem is deactivating playlist mode? @benoit74
@RavanJAltaie Yes, you are right. Recipes with multiple playlists in one ZIM are OK.
and yes we just want to get rid of the playlist mode
All fixed successfully!
Great, thank you!
I reopen the issue because I still have my part of the job to do (remove ability to create youtube recipes which will create multiple ZIMs at once)
@RavanJAltaie I'm sorry but madrasa_ar_playlists
is still using the Playlists mode
, please fix it before I can proceed.
It's not clear from this ticket what happened exactly and what will happen:
aimhi_playlists
in the zimfarm.All fixed.
Again, I still have my part of the job to do
@RavanJAltaie could you please detail recipe per recipe of https://github.com/openzim/zimfarm/issues/878#issuecomment-1851636439 what has been done?
I had a quick look and it seems that in many cases, you simply removed the playlist mode and created one big ZIM instead of many small ones, is this correct? The only exception is madrasa?
When you used this "create only one ZIM instead of many small ones" approach, it looks like you kept the old small ZIMs in the library, is this intentional? Content is evergreen so we do not mind to keep them in the library and not update them anymore?
I'm not convinced by this strategy, usually there was only 5/6 playlists and it did not looked like the number of playlists was frequently updated. Small ZIMs are usually more practical for our users. For https://farm.openzim.org/recipes/voa_learning_en_all for instance, we moved from ZIM ranging from 59.48 MB to 12.72 GB to one enormous (from my perspective at least) 24.93G ZIM. But maybe users are always downloading all ZIMs, so the extra work to create individual ZIMs is not worth it. It is just that this decision is very opaque and has not been explained, so it feels a bit weird.
For madrasa I'm not convinced about the ZIM name / filename. For instance you choose madrasa_astronomy_ar_all
while I consider it should be madrasa_ar_astronomy
(project is madrasa
, selection is astronomy
, just like we have wikipedia_en_football
, ...)
And for madrasa is there any reason to keep the two disabled recipes? Especially madrasa_ar_playlists which still uses the playlist mode?
@benoit74
I had a quick look and it seems that in many cases, you simply removed the playlist mode and created one big ZIM instead of many small ones, is this correct? The only exception is madrasa?
Yes that's correct, this is the decision made by @Popolechien & me after discussing the #878 issue.
I'm not convinced by this strategy, usually there was only 5/6 playlists and it did not looked like the number of playlists was frequently updated. Small ZIMs are usually more practical for our users. For https://farm.openzim.org/recipes/voa_learning_en_all for instance, we moved from ZIM ranging from 59.48 MB to 12.72 GB to one enormous (from my perspective at least) 24.93G ZIM. But maybe users are always downloading all ZIMs, so the extra work to create individual ZIMs is not worth it. It is just that this decision is very opaque and has not been explained, so it feels a bit weird.
That's the strategy followed in creating madrasa playlists, but for the few corrected playlists, we've decided to keep them in one file, but I can re-discuss this with @Popolechien today and change it if agreed upon. Personally I don't think it worths splitting the playlists.
For madrasa I'm not convinced about the ZIM name / filename. For instance you choose madrasa_astronomy_ar_all while I consider it should be madrasa_ar_astronomy (project is madrasa, selection is astronomy, just like we have wikipedia_en_football, ...)
I agree with you, I'll change the naming for all the files and apply this on new creations as well.
And for madrasa is there any reason to keep the two disabled recipes? Especially madrasa_ar_playlists which still uses the playlist mode?
No, no reason, I'll open an issue to delete them.
Also, as the convention clearly expresses, Project name instead of domain name should be exceptional. I have the feeling this rule frequently abused. @Popolechien @RavanJAltaie please clarify this
Also, as the convention clearly expresses, Project name instead of domain name should be exceptional. I have the feeling this rule frequently abused. @Popolechien @RavanJAltaie please clarify this
in this case the naming for madrasa should be: Youtube_ar_madrasa_astronomy?
madrasa.org_ar_astronomy
for this one but for all the youtube-only recipes, there a convention needs to be decided, document and followed.
in this case the naming for madrasa should be: Youtube_ar_madrasa_astronomy?
We must reserve _
as separator in ZIM name and ZIM filename, i.e. project and selection must use only alphanums + .
+ -
, I will update the convention to make it clearer (speak up if I forgot a needed character).
That's the strategy followed in creating madrasa playlists, but for the few corrected playlists, we've decided to keep them in one file, but I can re-discuss this with @Popolechien today and change it if agreed upon. Personally I don't think it worths splitting the playlists.
No need to discuss it again if it has been agreed upon, just it would have been better to put these conclusions here before so that everyone involved would be aware of this and we keep a track record, I'm pretty sure we will have a question about it in few months.
What about older ZIMs (per playlist), do we keep them in the library? For madrasa, since you are changing the name, you will probably also have to delete older ZIMs.
Let's close finish this issue regarding zimfarm ability to create multiple ZIM per recipe for youtube scraper.
Everything else can be discussed separately if needed (and is at least partially already an ongoing effort)
See https://github.com/openzim/youtube/issues/147.
Recipes using this configuration should be listed and a migration scenario should be decided first.