Open hfst-importer opened 11 years ago
If I try to use attached script to analyse attached tokenised piece of news from hs.fi, the lookupping gets stuck in: {u'POS': [u'ADVERB'], u'WORD_ID': [u'ennen']} {u'CASE': [u'NOM', u'PAR'], u'GUESS': [u'COMPOUND'], u'ALLO': [u'IA'], u'POS': [u'NOUN', u'NOUN'], u'NUM': [u'SG', u'PL'], u'BOUNDARY': [u'COMPOUND'], u'WORD_ID': [u'kerta', u'er\xe4']} {u'CASE': [u'NOM'], u'NUM': [u'SG'], u'SUBCAT': [u'CARD'], u'POS': [u'NUMERAL'], u'WORD_ID': [u'155']} {u'CASE': [u'ILL'], u'ALLO': [u'VN'], u'NUM': [u'SG'], u'POS': [u'NOUN'], u'WORD_ID': [u'miljoona']} {u'CASE': [u'ILL'], u'ALLO': [u'VN'], u'NUM': [u'SG'], u'POS': [u'NOUN'], u'WORD_ID': [u'euro']}
That is: "ennen kertaeriä 155 miljoonaan euroon loka-joulukuussa."
Omorfi used is in googlecode git master with default settings.
Reported by: flammie
HS tokenised, as of course sf.net does not have ability to upload two files at once :-\
Original comment by: flammie
If I try to use attached script to analyse attached tokenised piece of news from hs.fi, the lookupping gets stuck in: {u'POS': [u'ADVERB'], u'WORD_ID': [u'ennen']} {u'CASE': [u'NOM', u'PAR'], u'GUESS': [u'COMPOUND'], u'ALLO': [u'IA'], u'POS': [u'NOUN', u'NOUN'], u'NUM': [u'SG', u'PL'], u'BOUNDARY': [u'COMPOUND'], u'WORD_ID': [u'kerta', u'er\xe4']} {u'CASE': [u'NOM'], u'NUM': [u'SG'], u'SUBCAT': [u'CARD'], u'POS': [u'NUMERAL'], u'WORD_ID': [u'155']} {u'CASE': [u'ILL'], u'ALLO': [u'VN'], u'NUM': [u'SG'], u'POS': [u'NOUN'], u'WORD_ID': [u'miljoona']} {u'CASE': [u'ILL'], u'ALLO': [u'VN'], u'NUM': [u'SG'], u'POS': [u'NOUN'], u'WORD_ID': [u'euro']}
That is: "ennen kertaeriä 155 miljoonaan euroon loka-joulukuussa."
Omorfi used is in googlecode git master with default settings.
Reported by: flammie