skroutz / greek_stemmer

A simple Greek stemming library
MIT License
57 stars 12 forks source link

"φανελακι" does not have the same stem as "φανελα" #7

Open chief opened 9 years ago

greenonion commented 9 years ago

@chief I think this is true for many (all?) diminutives. For example, «ΠΑΙΧΝΙΔΑΚΙ» is stemmed to «ΠΑΙΧΝΙΔΑΚ», while «ΠΑΙΧΝΙΔΙ» is stemmed to «ΠΑΙΧΝΙΔ». I will look into the handling of the suffix «ΑΚΙ» and try to fix it.

astathopoulos commented 9 years ago

Diminutive forms are not stem by choice. I can't recollect the reason but if you check the stemming samples you can see which words will get stemmed wrongly.

greenonion commented 9 years ago

@astathopoulos hmm I was afraid of that - ok I'll look into it, thanks!

astathopoulos commented 9 years ago

@greenonion Try on skroutz "φουρνος" and "φουρνακι". You don't want to get the same results on this searches.

greenonion commented 9 years ago

@astathopoulos Yep, I see. So you think it will lead to overstemming.

astathopoulos commented 9 years ago

Yep! There are some cases where the stemming process is subjective.

greenonion commented 9 years ago

So should we treat it on a per-case basis or ignore it in general? For example maybe we want «φανέλα» and «φανελάκι» to have the same stem, not sure.