Closed eroux closed 2 years ago
@eroux what do you mean by "contraction"?
I mean the case for which the rule has been done, like པགི་ > པག་གི་
to be a bit more explicit: པགི་
cannot be "normal" Tibetan, it's necessarily པག་གི་
, so the rule works in that case. OTOH, དགི
can be regular Tibetan and doesn't necessarily represent དག་གི
, and I'm not sure it's a good idea to apply the rule in that case. (I'm also not sure it's a bad idea)
both དགི and འགི can be regular Tibetan. Let me have a deeper look at it and I will let you know.
yes, དགི and འགི (and བགི and མགི) can be regular Tibetan, that's why I think the rule should be adjusted. The rule works on པགི་ though, which cannot be regular Tibetan
Hi @eroux I am back on finalizing the normalisation grammar - atm I am working on the OT Ramayana and I added new rules to take care of some cases in the new text. I want to solve the issues that you raised as well. For this contractions with the genitive. བགི and མགི cannot be regular Tibetan. དགི and འགི can - in all OTDO there are no cases of དགི and few cases of འགི which I don't think are contractions (and do not appear in our texts). I would modify the rule as follow:
SPLITCOHORT ( "<$1>"v "$1ག་"v σ "<$2>"v "ག$4"v σ )("<([^དའ])((\u0F42)([\u0F72\u0F80]་?))>"r)
What do you think?
looks good, thanks!
in:
that means you're also going to split the following:
དགི*
->དག་གི*
བགི*
->བག་གི*
མགི*
->མག་གི*
འགི*
->འག་གི*
but in these 4 cases it can be analyzed as
instead of the contraction... so maybe
could be more accurate? Or maybe in some cases it's more likely to be a contraction ? wdyt?