no-plagiarism / pymorphy3

Morphological analyzer / inflection engine for Russian and Ukrainian languages.
https://pymorphy2.readthedocs.io/
MIT License
62 stars 7 forks source link

Adjectives are treated as last names #1

Open DSLituiev opened 1 year ago

DSLituiev commented 1 year ago

Many adjectives are treated as surnames (e.g., "зелена"), with the normal form being feminine (e.g., "зелена") instead of masculine (e.g., "зелений"), as is the convention for adjectives. This behaviour is case-insensitive.

This issue was discovered while integrating a pull request to spacy, where earlier pymorphy2 code was ported to pymorphy3.

>>> analyser.parse("зеленої")
[Parse(word='зеленої', tag=OpencorporaTag('NOUN,Surn,femn,anim gent'), normal_form='зелена', score=1.0, methods_stack=((DictionaryAnalyzer(), 'зеленої', 27, 1),)), 
Parse(word='зеленої', tag=OpencorporaTag('ADJF femn,gent,compb'), normal_form='зелений', score=1.0, methods_stack=((DictionaryAnalyzer(), 'зеленої', 2674, 11),))]
>>> analyser.parse("чорної")
[Parse(word='чорної', tag=OpencorporaTag('NOUN,Surn,femn,anim gent'), normal_form='чорна', score=1.0, methods_stack=((DictionaryAnalyzer(), 'чорної', 27, 1),)),
Parse(word='чорної', tag=OpencorporaTag('NOUN,femn,inan gent'), normal_form='чорна', score=1.0, methods_stack=((DictionaryAnalyzer(), 'чорної', 356, 1),)), Parse(word='чорної', tag=OpencorporaTag('ADJF femn,gent,compb'), normal_form='чорний', score=1.0, methods_stack=((DictionaryAnalyzer(), 'чорної', 2425, 11),)),
Parse(word='чорної', tag=OpencorporaTag('NOUN,anim femn,gent'), normal_form='чорний', score=1.0, methods_stack=((DictionaryAnalyzer(), 'чорної', 4456, 9),))]
Kowalski0805 commented 1 year ago

Hi @DSLituiev, thanks for pointing out! I suppose it may be related to the LT3OpenCorpora package, as pymorphy3-dicts and pymorph3 weren't changed much from the original, but it probably needs more time to investigate.