nytud / hunlp-GATE

Lang_Hungarian - a GATE plugin containing Hungarian NLP tools as GATE processing resources
GNU General Public License v3.0
8 stars 6 forks source link

Hfst wrapper analyzer erroneously suppress empty surface form #26

Open dlazesz opened 6 years ago

dlazesz commented 6 years ago

Here: https://github.com/dlt-rilmta/hunlp-GATE/blob/master/Lang_Hungarian/resources/hfst/hfst-wrapper/src/hu/nytud/hfst/Analyzer.java#L27

Analyzer suppresses the empy surface of the pronoun resulting: Input:

:ő :k :[/N|Pro] h:h e:e l:l y:y e:e t:t t:t :[/Post] ü:ü k:k :[3Pl]

Output:

ők[/N|Pro]+helyett[/Post]=helyett+ük[3Pl]=ük

Expected:

ők[/N|Pro] **=** +helyett[/Post]=helyett+ük[3Pl]=ük

As the stemmer erroneously generates:

helyettük ők[/N|Pro]+helyett[/Post]=helyett+ük[3Pl]=ük őkhelyett [/Post][3Pl]

Instead of the expected

helyettük ők[/N|Pro]+helyett[/Post]=helyett+ük[3Pl]=ük helyett [/Post][3Pl]

Solution: Do not treat empty surface form as special. Print as '='