Open niru86 opened 2 years ago
Hi there, are you still interested in working on this? I'd be happy to collaborate if you're still interested.
We can model Esperanto parser on the FinnishLanguage
object in finnish.py
:
class FinnishLanguage(Language):
pronunciation_dictionary_filename = os.path.join(PATH_DICTS,'en','english.tsv')
lang = 'fi'
cache_fn = 'finnish_wordtypes'
@cache
def get(self, token):
token=token.strip()
Annotation = make_annotation(token)
syllables=[]
wordbroken=False
for ij in range(len(Annotation.syllables)):
try:
sylldat=Annotation.split_sylls[ij]
except IndexError:
sylldat=["","",""]
syllStr=""
onsetStr=sylldat[0].strip().replace("'","").lower()
nucleusStr=sylldat[1].strip().replace("'","").lower()
codaStr=sylldat[2].strip().replace("'","").lower()
for x in [onsetStr,nucleusStr,codaStr]:
x=x.strip()
if not x: continue
if (not x in orth2phon):
for y in x:
y=y.strip()
if not y: continue
if (not y in orth2phon):
wordbroken=True
else:
syllStr+="".join(orth2phon[y])
else:
syllStr+="".join(orth2phon[x])
syllables.append(syllStr)
wordforms=[]
sylls_text=[syll for syll in Annotation.syllables]
for stress in Annotation.stresses:
sylls_ipa = [stress2stroke[stress[i]]+syllables[i] for i in range(len(syllables))]
wf=WordForm(
token,
sylls_ipa=sylls_ipa,
sylls_text=sylls_text,
)
wordforms.append(wf)
wordtype = WordType(token, children=wordforms, lang=self.lang)
return wordtype
All we need is a .get(token)
method that can take an arbitrary word string and return a WordType
object composed of the syllabified data (phonemes + orthography).
It then should work like this:
In [10]: from prosodic.langs.finnish import Finnish
In [11]: word = Finnish().get('kalevala')
In [12]: for syll in word.syllables:
...: print(syll)
...:
Syllable(ipa="'kɑ", num=1, txt='ka', is_stressed=True, is_heavy=False, is_strong=True, is_weak=False)
Syllable(ipa='le', num=2, txt='le', is_stressed=False, is_heavy=False, is_strong=False, is_weak=True)
Syllable(ipa='`vɑ', num=3, txt='va', is_stressed=True, is_heavy=False, is_strong=True, is_weak=False)
Syllable(ipa='lɑ', num=4, txt='la', is_stressed=False, is_heavy=False, is_strong=False, is_weak=True)
Let me know if you have thoughts. It's great that Esperanto is rule-based in its stress: seems doable to incorporate!
I'm trying to adapt prosodic to Esperanto: its stress is always paroxytonic abelo (en. bee) [a.'be.lo] but in poetry there can be elision and the word would become oxytonic abel'
Esperanto is as phonematic as Finnish, so I decided to use the orth feature, but I'm puzzled in LANG_stress.py because I don't understand its code :( Could you help me? I want to use prosodic for my MA research.