scaife-viewer / beyond-translation-site

Site used to iterate on translation alignments within the Scaife Viewer ecosystem
3 stars 4 forks source link

Override transliterator #164

Open jacobwegner opened 12 months ago

jacobwegner commented 12 months ago

refs https://github.com/scaife-viewer/beyond-translation-site/issues/91#issuecomment-1238571736

@jchill-git mentioned that CAMeL may offer an improved approach to transliteration.

Here is the current transliterator applied to Arabic text:

عمر الى شَيْخْ حنته

https://icu4c-demos.unicode.org/icu-bin/translit

image

Building on #163, we may want to add a token-level field for transliteration.

jacobwegner commented 11 months ago

via camel_transliterate

$ echo 'عمر الى شَيْخْ حنته' > sample.txt
$ camel_transliterate -s ar2safebw sample.txt
Emr AlY cayoxo Hnth

I need to install some additional dependencies to get camel_tools installed on macOS.

jacobwegner commented 11 months ago

Via their Python CLI:

from camel_tools.utils.charmap import CharMapper
from camel_tools.utils.transliterate import Transliterator

ar2safebw = CharMapper.builtin_mapper('ar2safebw')
transliterator = Transliterator(ar2safebw)
transliterator.transliterate('عمر الى شَيْخْ حنته')
'Emr AlY cayoxo Hnth'