unicode-org / inflection

code, data and documentation related to handling inflection problems
Other
0 stars 1 forks source link

Support bound morphemes through annotation #8

Open grhoten opened 4 months ago

grhoten commented 4 months ago

This topic gets into an area where it's important to reinflect parts of words. Some words are made from bounded morphemes. Bound morphemes are words that are not independent and attach to another word (typically without a space).

For example, in Spanish, you may have the word "damelo". The equivalent phrase in English is "give it to me" or "give me it". The "da" is the verb "give". The Spanish "me" is "me" in English. The "lo" is the masculine form of "it". The "lo" can change depending on what object that it's referring to. The word "danoslo" means "give us it", and "damelos" means "give me it (plural)".

For other languages, like Hebrew or Arabic, the possessive pronouns are suffixes to nouns. The phrase "your messages" depends on the gender of the audience in the singular second person. The suffix that you use will be different between masculine or feminine forms, especially if you must pronounce the word "your". Sometimes you can get by through omitting the optional diacritics, but that's not possible when you must pronounce it.

I don't recommend breaking this down too far. For example, "football" should remain as is. The word "unstoppable" has the bound morphemes "un-" and "-able" attached to the main word "stop". This request is mostly concerned about grammatical agreement. In Spanish, Arabic, Hebrew and others, I may choose a different bound morpheme depending on the grammatical properties of the object or human being referred to.