Closed imedadel closed 5 years ago
Unfortunately the way to add new languages is rather involved, and includes an external standards body. I recently wrote about it here: https://github.com/wooorm/franc/issues/74#issuecomment-490362308
I see. Well, it seems that UDHR translations are quite outdated (the one for Tamazight goes back to 1998, before the official recognition of the language and the standardization of its orthography).
So, could you please explain the process of making franc
? I would like to make a similar project but using Bible translations instead of UDHR 😄 Thanks!
PS: the part that I don't understand the most is the data.json
files.
Fun! Bible is a good idea as well. I believe there’s less available bibles (in unicode) though, but it is a longer text.
wooorm/udhr
crawls and generates documents in unicode, wooorm/trigrams
generates trigrams from them, script/build.js
generates data
I see 😃 Bible.com hosts around 1300 translations (including multiple varieties of Arabic and Tamazight), although they are not all complete. So I'll look for the most translated chapter and use it.
Yeah, this is gonna be fun :D
Maghrebi Arabic is the variety of Arabic spoken (and written) across the Maghreb region. Depending on the situation, it can be written in Arabic script, Hebrew script, or Latin script.
For now, I would like to focus on the Arabic script.
Adding support for Maghrebi Arabic written in the Arabic script can be easy since the detection of the letters ڨ or ڥ or ڭ or ڜ is enough. The letters ڢ and ڧ are also attested, although in older documents or printed ones (rarely online).