hfst / hfst-ospell

HFST spell checker library and command line tool
Apache License 2.0
13 stars 9 forks source link

hfst-ospell gives different results with varying -n values, no speed boost #17

Closed hfst-importer closed 8 years ago

hfst-importer commented 9 years ago

To repeat:

  1. download divvun.no/static_files/sma-hfst-ospell-n-bug-2015_09_15.zhfst
  2. run the following command:
$ time echo Zealaantesnie | hfst-ospell -S -n 15 sma-hfst-ospell-n-bug-2015_09_15.zhfst ; time  echo Zealaantesnie | hfst-ospell -S -n 5 sma-hfst-ospell-n-bug-2015_09_15.zhfst

Result:

ospell: warning: symbol <CORR> not present in lexicon
"Zealaantesnie" is NOT in the lexicon:
Corrections for "Zealaantesnie":
Zealaantesne    7.007812
Zaandamesne    10.007963
Zealandesne    10.007963
Zhenjiangesne    10.007963
Zugspitzesne    10.007963
Zhejiangesne    10.007963
Zhanjiangesne    10.007963
Zernichowesne    10.007963
Zermattesne    10.007963
Zeelandesne    10.007963
Zakopanesne    10.007963
Zeebruggesne    10.007963
Zapadnajesne    10.007963
Zagorskesne    10.007963
Zamboangesne    10.007963

real    1m53.472s
user    1m52.900s
sys 0m0.281s
ospell: warning: symbol <CORR> not present in lexicon
"Zealaantesnie" is NOT in the lexicon:
Corrections for "Zealaantesnie":
Zaandamesne    10.007963
Zakopanesne    10.007963
Zapadnajesne    10.007963
Zagorskesne    10.007963
Zamboangesne    10.007963

real    1m51.617s
user    1m50.903s
sys 0m0.372s

Notice how the expected suggestion «Zealaantesne« is the first suggestion (ie lowest weight) when -n=15, whereas it is not suggested at all when -n=5. Also note that there is no real speed difference between the two values of -n (and also not compared to no -n at all).

Expected behavior:

  1. The five first suggestions should be identical for the two values of -n
  2. There should be a noticable speed boost of -n=5 compared to -n=15 compared to no -n

Reported by: snomos

hfst-importer commented 9 years ago

Sorry, the link was not turned into a real link. Here it is: https://divvun.no/static_files/sma-hfst-ospell-n-bug-2015_09_15.zhfst

Original comment by: snomos

hfst-importer commented 9 years ago

Verified bug. This appears to be due to me brain-deadedly misunderstanding std::priority_queue. Should be fixed soon when I get the opportunity.

Original comment by: Traubert

hfst-importer commented 9 years ago

The following comment was just posted on IRC:

Regarding hfst-ospell r4440, std::list should be avoided at almost all costs. If you need something that doesn't invalidate iterators on insert, use std::deque - otherwise, default to std::vector.

Original comment by: snomos

hfst-importer commented 9 years ago

I wonder is that due to the frequently true assumption that lists are usually slow? In this case my tests indicate that the list implementation is a little bit faster.

Original comment by: Traubert

hfst-importer commented 9 years ago

Original comment by: Traubert