hfst / hfst-optimized-lookup

HFST optimized-lookup standalone library and command line tool
12 stars 8 forks source link

hfst_lookup.py gets stuck #2

Open hfst-importer opened 11 years ago

hfst-importer commented 11 years ago

If I try to use attached script to analyse attached tokenised piece of news from hs.fi, the lookupping gets stuck in: {u'POS': [u'ADVERB'], u'WORD_ID': [u'ennen']} {u'CASE': [u'NOM', u'PAR'], u'GUESS': [u'COMPOUND'], u'ALLO': [u'IA'], u'POS': [u'NOUN', u'NOUN'], u'NUM': [u'SG', u'PL'], u'BOUNDARY': [u'COMPOUND'], u'WORD_ID': [u'kerta', u'er\xe4']} {u'CASE': [u'NOM'], u'NUM': [u'SG'], u'SUBCAT': [u'CARD'], u'POS': [u'NUMERAL'], u'WORD_ID': [u'155']} {u'CASE': [u'ILL'], u'ALLO': [u'VN'], u'NUM': [u'SG'], u'POS': [u'NOUN'], u'WORD_ID': [u'miljoona']} {u'CASE': [u'ILL'], u'ALLO': [u'VN'], u'NUM': [u'SG'], u'POS': [u'NOUN'], u'WORD_ID': [u'euro']}

That is: "ennen kertaeriä 155 miljoonaan euroon loka-joulukuussa."

Omorfi used is in googlecode git master with default settings.

Reported by: flammie

hfst-importer commented 11 years ago

HS tokenised, as of course sf.net does not have ability to upload two files at once :-\

Original comment by: flammie