Closed hugolpz closed 2 years ago
What categories do you want to synchronize?
Aim is to provide convenient dumps for each category in Category:Lingua_Libre_pronunciation. The largest ones are 60k (ben) and 250k (fra) files strong. The whole 130 categories contain 700,000 files.
The point for WikiapiJS .download()
is scalability, ability to handle such large categories with resilience and speed, their initial download and their later periodic update. Ideally weekly.
Well, it seems I need to do some works...
Nice !
This scale up question is handled in two related issues:
Hi there, I'm using WikiapiJS to code a wikiapi-egg (script) which will download all Commons files from target categories. My 3 largest target categories currently have about 50k audios files each, files being of 1.5KB each. Do you know:
cmlimit=500
for regular users,cmlimit=5000
ifapihighlimits
userright.Scale up
It's to provides the public direct and convenient dumps of LinguaLibre's audio assets on a per language basis. We want to create periodic (weekly?) dumps on our Lili server.
We want to keep a local dump synchronized based on Wikimedia Commons. We are talking about 700,000 files so far. According to tests duration above, the initial synchronization would take 21 days, that is ok. But the later "updates" a week later would require about 15 days while only 1~2% of new files (7,000-15,000) will require a download.
Do you have possible optimization at sight ?
WikiapiJS download worked on tiny categories (files =12). See #48 code. I'm currently reluctant to test further by fear of being banned.
.download()
bentchmark (1)Ok, I decided to test anyway on a category with n=369.