openzim / ted

Provide the best of TED.com for offline usage!
https://download.kiwix.org/zim/ted/
GNU General Public License v3.0
13 stars 8 forks source link

Filter and order languages in ZIM Language metadata #172

Closed benoit74 closed 2 months ago

benoit74 commented 3 months ago

When multiple languages are required but only some of them are found, the scraper still sets all languages in ZIM Language metadata (or at least it will once https://github.com/openzim/ted/pull/170 will be merged).

The scraper should in fact:

This is going to be a bit tricky because TED lang codes are different than ISO-639-3 codes, we will have to be careful about that.

Note that once #171 is implemented, the exact list of languages will be dynamic.

rgaudin commented 3 months ago

We've been using scraperlib to get one from the other. We even have a mapping table for those that can't match