openzim / zimfarm

Farm operated by bots to grow and harvest new zim files
https://farm.openzim.org
GNU General Public License v3.0
81 stars 25 forks source link

Provide a monthly report of youtube API keys #854

Closed benoit74 closed 10 months ago

benoit74 commented 10 months ago

The content team would like to have monthly overview of which Youtube API keys are used by which recipes, to check everything is in order.

API keys should probably not be displayed in the report but only referenced by name ; and they should not be stored in code, so a map of {API key name => API key value hash} is probably the way to go.

Report creation must be automated, how to push it to content team could probably be manual in a first step, or it could be an issue on Github, to be discussed.

benoit74 commented 10 months ago

Code gathering the data is ready, and as discussed I store only key hashes in the configuration to avoid centralizing all API keys in plain text in one file. This code will run once per month, as a kubernetes job.

@RavanJAltaie @Popolechien Is it ok for you if we create one issue per month in zim-request which reports API keys used and ask you to review, adapt configuration if needed and then you can close the issue on your own?

@kelson42 @rgaudin Are you ok with this design idea? Can we reuse the Github token already used by Github CI to setup the k8s job?

rgaudin commented 10 months ago

Yes, that seems appropriate.

Popolechien commented 10 months ago

We'll need a bit of briefing beforehand to make but yeah, should be manageable.

benoit74 commented 10 months ago

We just have discussed about it right now with Stephane and Ravan and have agreed that:

During the meeting, I agreed to investigate how feasible this would be to provide this information.

While writing this comment, I realize that this effort around API keys usage / monitoring is maybe useless since we might get rid of these API keys once we achieve to implement https://github.com/openzim/youtube/issues/177

Rather than investigating this issue (having a monthly report of API keys usage) I will probably take time to investigate the root cause (could we get rid of API keys). I will keep you informed.

rgaudin commented 10 months ago

I don't understand the last sentence. We can't use current code without API Keys. We can get rid of API Keys by implementing mentioned ticket. There's no technical challenge but it's a piece of work.

I also don't think it's wise use of time to fetch API keys usage or nb of videos.

I think the distribution of recipes with API keys is a good start that's both easy to deploy and easy to assess.

benoit74 commented 10 months ago

In the last sentence I meant I would spend my time more wisely on mentioned issue (which would allow us to get rid of API keys) rather than fetching number of videos per API key.

rgaudin commented 10 months ago

Ah ok 👍

benoit74 commented 10 months ago

I just tried to grab number of videos manually and indeed it is quite fast to do, it took me 5 minutes to process 13 recipes, since we have 134 recipes it would take about 1 hour to process them all, it is not a very tedious task and definitely not a full project to get the full list.

So:

List of videos per recipe I explored:

- cest-pas-sorcier_fr_all: 1900 videos
- voa_learning_english_all_playlists: 5500 videos
- universcience-tv_fr_all ???? (channel is down ?)
- crashcourse_en_all 1500
- madrasa 6100
- mali-pour-les-nuls 8
- tedmed_en_all 791
- wikistage_mul_all ???? (channel is down ?)
- litterature-audiobooks-poetry_fr 671
- los_miserables_audiobook ???? (channel is down ?)
- 2021_ted_countdown_global 49
- teded_en_all 2100
- ubongo_sw 573
benoit74 commented 10 months ago

@RavanJAltaie @Popolechien: you are the two persons who will be assigned the created issues, correct?

Popolechien commented 10 months ago

@Ravan primarily but yes, this is correct.

kelson42 commented 10 months ago

The content team would like to have monthly overview of which Youtube API keys are used by which recipes, to check everything is in order.

I would prefer to have a description of what is exactly the problem the content team faces, instead of running to the proposal they have in their mind. This is the only way we can have an open discussion about what would be the best solution to tackle the problem.

AFAIK, the problem is that we keep having Youtube tasks failing because the scraper runs again API keys quota limitations. The diagnostic is not easy because we have many API keys and this is not always the same recipe failing (depending the scheduling). I was pretty sure I had open a ticket about this years ago, but I can not find it anymore.

Anyway, IMO, this is not a problem for a human, this is a problem to solve for a computer. API keys should not be assigned to the recipe, but -dynamically- to the task or even to the scraper at runtime.

kelson42 commented 10 months ago

I was pretty sure I had open a ticket about this years ago, but I can not find it anymore.

This was https://github.com/openzim/zimfarm/issues/558, unfortunately I don't have written the rationals. My bad.

rgaudin commented 10 months ago

We've discussed this already ; we think it's less work to use ytdlp than implementing this properly