voikko / corevoikko

Libvoikko and essential linguistic resources
Other
89 stars 25 forks source link

Acronyms not recognised when in zhfst lexicon, instead suggested #21

Closed snomos closed 8 years ago

snomos commented 8 years ago

To repeat, download the smj.zhfst file from: http://divvun.no/static_files/zhfsts/smj.zhfst and do:

$ echo NRK | voikkospell -s -d smj -p ./ ignore_dot=1 W: NRK S: NRK S: NOK S: NSR S: BNRK S: ERK $ echo NRK | voikkospell -s -d smj -p ./ W: NRK S: NRK S: NOK S: NSR S: BNRK S: ERK

As seen above, the input word is not recognised, but you get the same string as the first suggestion. This is not the case when using hfst tools:

$ echo "NRK" | hfst-ospell smj.zhfst "NRK" is in the lexicon...

$ echo "5 NRK" | hfst-ospell-office smj.zhfst @@ hfst-ospell-office is alive *

(* = it is correct/recognised).

hatapitk commented 8 years ago

Should be fixed now. Unfortunately we don't have integration or regression tests for HFST backend so it would be nice to have wider testing for this. I don't think this change should break anything though.