kbatsuren / wiktra

Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)
GNU General Public License v2.0
27 stars 5 forks source link

Update to cover all Wiktionary transliterations? #5

Closed twardoch closed 3 years ago

twardoch commented 3 years ago

How feasible would it be to update this to cover all Wiktionary transliterations, and possibly expose additional functionality?

  1. Update to cover all transliteration modules: https://en.wiktionary.org/wiki/Category:Transliteration_modules

  2. Change the API so we can specify the input ISO 639 language code and/or ISO 15924 script code, possibly only the ISO script code without specifying the language.

  3. Change the API so we can optionally specify the output language and/or script code, so we can transliterate to other scripts/languages, not just Latin, using modules like https://en.wiktionary.org/wiki/Module:Newa-Deva-translit

  4. It would be best that language and script lookups would be done using native Lua modules (see dependencies below)

  5. Provide instructions or some script that automatically updates the Lua components from Wiktionary (I don’t know how this is done).

Possible dependencies

Integrate languages:

Integrate families:

Integrate scripts:

Integrate translit-redirect/data

Dependencies:

twardoch commented 3 years ago

Ps. Of course stub/replacement modules can be provided, as it’s already done.

twardoch commented 3 years ago

Ps. @tatuylonen has https://github.com/tatuylonen/wiktextract/ and https://github.com/tatuylonen/wikitextprocessor/ — I’ve opened an issue https://github.com/tatuylonen/wiktextract/issues/68 where I inquired about updating the Wiktionary transliteration modules.

I see that tatuylonen’s modules also use some "safe" workarounds to run Lua code. Perhaps your Wiktra could somehow integrate with wikitextprocessor/wiktextract so that the same techniques are used?

twardoch commented 3 years ago

I see that it’s possible to do https://en.wiktionary.org/w/index.php?title=Module:scripts&action=raw

twardoch commented 3 years ago

And And https://en.wiktionary.org/w/api.php?action=query&list=categorymembers&cmtitle=Category%3ATransliteration_modules&cmlimit=500&format=json

twardoch commented 3 years ago

I’ve created a Python tool that downloads Wiktionary modules: https://github.com/twardoch/wiktra-update

twardoch commented 3 years ago

I made this into a pull request https://github.com/kbatsuren/wiktra/pull/4 from my own fork https://github.com/twardoch/wiktra/

kbatsuren commented 3 years ago

Thank you so much for the detailed suggestions. I will have a look at it as soon as I come back from my vacation.

twardoch commented 3 years ago

OK, I think I can close this now :)