adding language: Esperanto

Hi there, are you still interested in working on this? I'd be happy to collaborate if you're still interested.

We can model Esperanto parser on the FinnishLanguage object in finnish.py:

class FinnishLanguage(Language):
    pronunciation_dictionary_filename = os.path.join(PATH_DICTS,'en','english.tsv')
    lang = 'fi'
    cache_fn = 'finnish_wordtypes'

    @cache
    def get(self, token):
        token=token.strip()
        Annotation = make_annotation(token)
        syllables=[]
        wordbroken=False
        for ij in range(len(Annotation.syllables)):
            try:
                sylldat=Annotation.split_sylls[ij]
            except IndexError:
                sylldat=["","",""]

            syllStr=""
            onsetStr=sylldat[0].strip().replace("'","").lower()
            nucleusStr=sylldat[1].strip().replace("'","").lower()
            codaStr=sylldat[2].strip().replace("'","").lower()

            for x in [onsetStr,nucleusStr,codaStr]:
                x=x.strip()
                if not x: continue
                if (not x in orth2phon):
                    for y in x:
                        y=y.strip()
                        if not y: continue
                        if (not y in orth2phon):
                            wordbroken=True
                        else:
                            syllStr+="".join(orth2phon[y])
                else:
                    syllStr+="".join(orth2phon[x])
            syllables.append(syllStr)

        wordforms=[]
        sylls_text=[syll for syll in Annotation.syllables]
        for stress in Annotation.stresses:
            sylls_ipa = [stress2stroke[stress[i]]+syllables[i] for i in range(len(syllables))]
            wf=WordForm(
                token, 
                sylls_ipa=sylls_ipa, 
                sylls_text=sylls_text,
            )
            wordforms.append(wf)
        wordtype = WordType(token, children=wordforms, lang=self.lang)
        return wordtype

All we need is a .get(token) method that can take an arbitrary word string and return a WordType object composed of the syllabified data (phonemes + orthography).

It then should work like this:

In [10]: from prosodic.langs.finnish import Finnish

In [11]: word = Finnish().get('kalevala')

In [12]: for syll in word.syllables:
    ...:     print(syll)
    ...: 
Syllable(ipa="'kɑ", num=1, txt='ka', is_stressed=True, is_heavy=False, is_strong=True, is_weak=False)
Syllable(ipa='le', num=2, txt='le', is_stressed=False, is_heavy=False, is_strong=False, is_weak=True)
Syllable(ipa='`vɑ', num=3, txt='va', is_stressed=True, is_heavy=False, is_strong=True, is_weak=False)
Syllable(ipa='lɑ', num=4, txt='la', is_stressed=False, is_heavy=False, is_strong=False, is_weak=True)

Let me know if you have thoughts. It's great that Esperanto is rule-based in its stress: seems doable to incorporate!

quadrismegistus / prosodic

adding language: Esperanto #36