Closed benoit74 closed 10 months ago
Code gathering the data is ready, and as discussed I store only key hashes in the configuration to avoid centralizing all API keys in plain text in one file. This code will run once per month, as a kubernetes job.
@RavanJAltaie @Popolechien
Is it ok for you if we create one issue per month in zim-request
which reports API keys used and ask you to review, adapt configuration if needed and then you can close the issue on your own?
@kelson42 @rgaudin Are you ok with this design idea? Can we reuse the Github token already used by Github CI to setup the k8s job?
Yes, that seems appropriate.
We'll need a bit of briefing beforehand to make but yeah, should be manageable.
We just have discussed about it right now with Stephane and Ravan and have agreed that:
During the meeting, I agreed to investigate how feasible this would be to provide this information.
While writing this comment, I realize that this effort around API keys usage / monitoring is maybe useless since we might get rid of these API keys once we achieve to implement https://github.com/openzim/youtube/issues/177
Rather than investigating this issue (having a monthly report of API keys usage) I will probably take time to investigate the root cause (could we get rid of API keys). I will keep you informed.
I don't understand the last sentence. We can't use current code without API Keys. We can get rid of API Keys by implementing mentioned ticket. There's no technical challenge but it's a piece of work.
I also don't think it's wise use of time to fetch API keys usage or nb of videos.
I think the distribution of recipes with API keys is a good start that's both easy to deploy and easy to assess.
In the last sentence I meant I would spend my time more wisely on mentioned issue (which would allow us to get rid of API keys) rather than fetching number of videos per API key.
Ah ok 👍
I just tried to grab number of videos manually and indeed it is quite fast to do, it took me 5 minutes to process 13 recipes, since we have 134 recipes it would take about 1 hour to process them all, it is not a very tedious task and definitely not a full project to get the full list.
So:
List of videos per recipe I explored:
- cest-pas-sorcier_fr_all: 1900 videos
- voa_learning_english_all_playlists: 5500 videos
- universcience-tv_fr_all ???? (channel is down ?)
- crashcourse_en_all 1500
- madrasa 6100
- mali-pour-les-nuls 8
- tedmed_en_all 791
- wikistage_mul_all ???? (channel is down ?)
- litterature-audiobooks-poetry_fr 671
- los_miserables_audiobook ???? (channel is down ?)
- 2021_ted_countdown_global 49
- teded_en_all 2100
- ubongo_sw 573
@RavanJAltaie @Popolechien: you are the two persons who will be assigned the created issues, correct?
@Ravan primarily but yes, this is correct.
The content team would like to have monthly overview of which Youtube API keys are used by which recipes, to check everything is in order.
I would prefer to have a description of what is exactly the problem the content team faces, instead of running to the proposal they have in their mind. This is the only way we can have an open discussion about what would be the best solution to tackle the problem.
AFAIK, the problem is that we keep having Youtube tasks failing because the scraper runs again API keys quota limitations. The diagnostic is not easy because we have many API keys and this is not always the same recipe failing (depending the scheduling). I was pretty sure I had open a ticket about this years ago, but I can not find it anymore.
Anyway, IMO, this is not a problem for a human, this is a problem to solve for a computer. API keys should not be assigned to the recipe, but -dynamically- to the task or even to the scraper at runtime.
I was pretty sure I had open a ticket about this years ago, but I can not find it anymore.
This was https://github.com/openzim/zimfarm/issues/558, unfortunately I don't have written the rationals. My bad.
We've discussed this already ; we think it's less work to use ytdlp than implementing this properly
The content team would like to have monthly overview of which Youtube API keys are used by which recipes, to check everything is in order.
API keys should probably not be displayed in the report but only referenced by name ; and they should not be stored in code, so a map of {API key name => API key value hash} is probably the way to go.
Report creation must be automated, how to push it to content team could probably be manual in a first step, or it could be an issue on Github, to be discussed.