hltdi / HornMorpho

Morphological processing for languages of the Horn of Africa
GNU General Public License v3.0
40 stars 16 forks source link

question on lemma #4

Open yosiasz opened 4 months ago

yosiasz commented 4 months ago

first off I am a big fan of your work, thank you so much for your work with global south

my question is on the following

(26)

w1 = hm.anal('a', 'ይሰማሉ')
w1[0]['lemma'] 'ሰማ' w1[1]['lemma'] 'ተሰማ'

in lemmatization of the English words of changed, changing, changer, changes the root is chang. In Amharic would the root be the unconjugated verb መስማት ?

Another question on conllu format, is this a valid conll format that can be dumped into a conllu file?

ጫላ ጩቤዬን ጨብጧል ። ጫላ PROPN SG ጫላ ጩቤዬን N ACC;PSS1S;SG ጩቤ ጨብጧል V 3;MASC;PRF;SG ጨበጠ

Thanks!

yosiasz commented 4 months ago

@megasser This might be in your wheel house, if interested in chiming in

https://github.com/UniversalDependencies/UD_Amharic-Inku/issues/2

megasser commented 4 months ago

Hi. Thanks for the supporting words.

Basically, what counts as a lemma for a given POS in a given language is a convention. For English verbs, this is the stem (infinitive minus 'to'). For verbs in other European languages, like German or French, it's the infinitive.

For Semitic languages (at least the ones I'm familiar with), it's the third person singular masculine perfective (or past). So when you look up verbs in an Arabic, Amharic, or Tigrinya dictionary, it's this form that you look for, not the infinitive (ሰማ rather than መስማት). It could have been the infinitive, but it isn't, possibly because, though unconjugated, the infinitive is also a derived form (it has a prefix and a particular form of the root) and because it behaves in many ways like a noun rather than a verb.

I hope this helps. There isn't a simple way to get the infinitive from the root of an Amharic verb in HornMorpho, but I'll add that in version 5.2.

Michael


From: ዮስያስ @.> Sent: Monday, June 24, 2024 9:22 AM To: hltdi/HornMorpho @.> Cc: Subscribed @.***> Subject: [External] [hltdi/HornMorpho] question on lemma (Issue #4)

This message was sent from a non-IU address. Please exercise caution when clicking links or opening attachments from external sources.

first off I am a big fan of your work, thank you so much for your work with global south

my question is on the following

(26)

w1 = hm.anal('a', 'ይሰማሉ') w1[0]['lemma'] 'ሰማ' w1[1]['lemma'] 'ተሰማ'

in lemmatization of the English words of changed, changing, changer, changes the root is chang. In Amharic would the root be the unconjugated verb መስማት ?

Thanks!

— Reply to this email directly, view it on GitHubhttps://github.com/hltdi/HornMorpho/issues/4, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABC2JDDALKPVMRX2PBHTA63ZJBBVLAVCNFSM6AAAAABJ2EXNMGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM3TANRUGA3TMMQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>

megasser commented 4 months ago

Thanks for this. I would have missed it. Michael


From: ዮስያስ @.> Sent: Monday, June 24, 2024 9:34 AM To: hltdi/HornMorpho @.> Cc: Gasser, Michael Edward @.>; Mention @.> Subject: [External] Re: [hltdi/HornMorpho] question on lemma (Issue #4)

This message was sent from a non-IU address. Please exercise caution when clicking links or opening attachments from external sources.

@megasserhttps://github.com/megasser This might be in your wheel house, if interested in chiming in

UniversalDependencies/UD_Amharic-Inku#2https://github.com/UniversalDependencies/UD_Amharic-Inku/issues/2

— Reply to this email directly, view it on GitHubhttps://github.com/hltdi/HornMorpho/issues/4#issuecomment-2186976152, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABC2JDGDKPSJB7GWODAWWYDZJBDDBAVCNFSM6AAAAABJ2EXNMGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBWHE3TMMJVGI. You are receiving this because you were mentioned.Message ID: @.***>