hipster-philology / nlp-pie-taggers

Extension for pie to include taggers with their models and pre/postprocessors
Mozilla Public License 2.0
11 stars 3 forks source link

Lasla capital "V" #7

Closed emanjavacas closed 4 years ago

emanjavacas commented 4 years ago

Some weird issue where the preprocessing seems to be failing for capital V.

model = get_model("lasla")
tagger = get_tagger("lasla")
s = "audivimus et laudantem eum qui est mirabilis in sanctis suis . Valete ."
i, f = model.get_iterator_and_formatter()
print(tagger.tag_str(s, i, f))

form    lemma   POS morph   treated_token
audivimus   audio   VER Numb=Plur|Mood=Ind|Tense=Perf|Voice=Act|Person=1    audiuimus
et  et2 CONcoo  MORPH=empty et
laudantem   laudo   VER Case=Acc|Numb=Sing|Mood=Par|Tense=Pres|Voice=Act    laudantem
eum is  PROdem  Case=Acc|Numb=Sing  eum
qui qui1    PROrel  Case=Nom|Numb=Sing  qui
est sum1    VER Numb=Sing|Mood=Ind|Tense=Pres|Voice=Act|Person=3    est
mirabilis   mirabilis   ADJqua  Case=Nom|Numb=Sing|Deg=Pos  mirabilis
in  in  PRE MORPH=empty in
sanctis sanctus ADJqua  Case=Abl|Numb=Plur|Deg=Pos  sanctis
suis    suus    PROpos.ref  Case=Abl|Numb=Plur  suis
.   .   PUNC    MORPH=empty .
Valete  alleo   VER Numb=Plur|Mood=Imp|Tense=Pres|Voice=Act|Person=2    valete
.   .   PUNC    MORPH=empty .

Whereas

model = get_model("lasla")
tagger = get_tagger("lasla")
s = "audivimus et laudantem eum qui est mirabilis in sanctis suis . valete ."
i, f = model.get_iterator_and_formatter()
print(tagger.tag_str(s, i, f))

form    lemma   POS morph   treated_token
audivimus   audio   VER Numb=Plur|Mood=Ind|Tense=Perf|Voice=Act|Person=1    audiuimus
et  et2 CONcoo  MORPH=empty et
laudantem   laudo   VER Case=Acc|Numb=Sing|Mood=Par|Tense=Pres|Voice=Act    laudantem
eum is  PROdem  Case=Acc|Numb=Sing  eum
qui qui1    PROrel  Case=Nom|Numb=Sing  qui
est sum1    VER Numb=Sing|Mood=Ind|Tense=Pres|Voice=Act|Person=3    est
mirabilis   mirabilis   ADJqua  Case=Nom|Numb=Sing|Deg=Pos  mirabilis
in  in  PRE MORPH=empty in
sanctis sanctus ADJqua  Case=Abl|Numb=Plur|Deg=Pos  sanctis
suis    suus    PROpos.ref  Case=Abl|Numb=Plur  suis
.   .   PUNC    MORPH=empty .
valete  ualeo   VER Numb=Plur|Mood=Imp|Tense=Pres|Voice=Act|Person=2    ualete
.   .   PUNC    MORPH=empty .
PonteIneptique commented 4 years ago

Looks like an easy fix most probably :D