hfst / hfst-ospell

HFST spell checker library and command line tool
Apache License 2.0
13 stars 9 forks source link

hfst-ospell segfaults with errmodel alpha outside lang model #14

Closed hfst-importer closed 8 years ago

hfst-importer commented 10 years ago

Two tests in current test suite:

ospell: warning: symbol a not present in lexicon
ospell: warning: symbol b not present in lexicon
ospell: warning: symbol c not present in lexicon
ospell: warning: symbol d not present in lexicon
ospell: warning: symbol f not present in lexicon
ospell: warning: symbol g not present in lexicon
ospell: warning: symbol h not present in lexicon
ospell: warning: symbol j not present in lexicon
ospell: warning: symbol k not present in lexicon
ospell: warning: symbol m not present in lexicon
ospell: warning: symbol n not present in lexicon
ospell: warning: symbol p not present in lexicon
ospell: warning: symbol q not present in lexicon
ospell: warning: symbol r not present in lexicon
ospell: warning: symbol x not present in lexicon
ospell: warning: symbol y not present in lexicon
ospell: warning: symbol z not present in lexicon
"olut" is in the lexicon (but correcting anyways)

./basic-edit1.sh: rivi 10: 1370 Valmis                  cat $srcdir/test.strings
  1371 Muistialueen ylitys     | ./hfst-ospell speller_edit1.zhfst
FAIL: basic-edit1.sh

[...]

ospell: warning: symbol a not present in lexicon
ospell: warning: symbol b not present in lexicon
ospell: warning: symbol c not present in lexicon
ospell: warning: symbol d not present in lexicon
ospell: warning: symbol f not present in lexicon
ospell: warning: symbol g not present in lexicon
ospell: warning: symbol h not present in lexicon
ospell: warning: symbol j not present in lexicon
ospell: warning: symbol k not present in lexicon
ospell: warning: symbol m not present in lexicon
ospell: warning: symbol n not present in lexicon
ospell: warning: symbol p not present in lexicon
ospell: warning: symbol q not present in lexicon
ospell: warning: symbol r not present in lexicon
ospell: warning: symbol x not present in lexicon
ospell: warning: symbol y not present in lexicon
ospell: warning: symbol z not present in lexicon
"olut" is in the lexicon (but correcting anyways)

Analyses for "olut":
./analyse-spell.sh: rivi 10: 1507 Valmis                  cat $srcdir/test.strings
  1508 Muistialueen ylitys     | ./hfst-ospell -a speller_analyser.zhfst
FAIL: analyse-spell.sh

Both tests have edit1 a-z for errmodel but small lexicon. Valground:

$ cat test.strings | libtool --mode=execute valgrind hfst-ospell speller_edit1.zhfst 
==4520== Memcheck, a memory error detector
==4520== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==4520== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==4520== Command: /home/flammie/Koodit/hfst-ospell/.libs/hfst-ospell speller_edit1.zhfst
==4520== 
ospell: warning: symbol a not present in lexicon
ospell: warning: symbol b not present in lexicon
ospell: warning: symbol c not present in lexicon
ospell: warning: symbol d not present in lexicon
ospell: warning: symbol f not present in lexicon
ospell: warning: symbol g not present in lexicon
ospell: warning: symbol h not present in lexicon
ospell: warning: symbol j not present in lexicon
ospell: warning: symbol k not present in lexicon
ospell: warning: symbol m not present in lexicon
ospell: warning: symbol n not present in lexicon
ospell: warning: symbol p not present in lexicon
ospell: warning: symbol q not present in lexicon
ospell: warning: symbol r not present in lexicon
ospell: warning: symbol x not present in lexicon
ospell: warning: symbol y not present in lexicon
ospell: warning: symbol z not present in lexicon
"olut" is in the lexicon (but correcting anyways)

==4520== Invalid read of size 2
==4520==    at 0x4E7E096: hfst_ol::IndexTable::input_symbol(unsigned int) const (hfst-ol.cc:673)
==4520==    by 0x4E85C82: hfst_ol::Transducer::has_transitions(unsigned int, unsigned short) const (ospell.cc:494)
==4520==    by 0x4E8529B: hfst_ol::Speller::mutator_epsilons() (ospell.cc:329)
==4520==    by 0x4E8648A: hfst_ol::Speller::correct(char*, int) (ospell.cc:614)
==4520==    by 0x4E94843: hfst_ol::ZHfstOspeller::suggest(std::string const&) (ZHfstOspeller.cc:169)
==4520==    by 0x4036D4: do_spell(hfst_ol::ZHfstOspeller&, std::string const&) (main.cc:164)
==4520==    by 0x403AC9: zhfst_spell(char*) (main.cc:281)
==4520==    by 0x403F21: main (main.cc:372)
==4520==  Address 0x9eada60 is not stack'd, malloc'd or (recently) free'd
==4520== 
==4520== Invalid read of size 2
==4520==    at 0x4E7E096: hfst_ol::IndexTable::input_symbol(unsigned int) const (hfst-ol.cc:673)
==4520==    by 0x4E85C82: hfst_ol::Transducer::has_transitions(unsigned int, unsigned short) const (ospell.cc:494)
==4520==    by 0x4E85793: hfst_ol::Speller::consume_input() (ospell.cc:385)
==4520==    by 0x4E86753: hfst_ol::Speller::correct(char*, int) (ospell.cc:641)
==4520==    by 0x4E94843: hfst_ol::ZHfstOspeller::suggest(std::string const&) (ZHfstOspeller.cc:169)
==4520==    by 0x4036D4: do_spell(hfst_ol::ZHfstOspeller&, std::string const&) (main.cc:164)
==4520==    by 0x403AC9: zhfst_spell(char*) (main.cc:281)
==4520==    by 0x403F21: main (main.cc:372)
==4520==  Address 0x9eada60 is not stack'd, malloc'd or (recently) free'd
==4520== 

Neither valgrind nor other platforms segfault here so it's probably off by one or ends neatly otherwise. GDB:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7b9b096 in hfst_ol::IndexTable::input_symbol (this=0x647238, 
i=65536) at hfst-ol.cc:673
673                    (indices + TransitionIndex::SIZE * i)); 
(gdb) bt
#0  0x00007ffff7b9b096 in hfst_ol::IndexTable::input_symbol (this=0x647238, 
i=65536) at hfst-ol.cc:673
#1  0x00007ffff7ba2c83 in hfst_ol::Transducer::has_transitions (this=0x647130, 
i=1, symbol=65535) at ospell.cc:494
#2  0x00007ffff7ba229c in hfst_ol::Speller::mutator_epsilons (this=0x649eb0)
at ospell.cc:329
#3  0x00007ffff7ba348b in hfst_ol::Speller::correct (this=0x649eb0, 
line=0x63c990 "olut", nbest=0) at ospell.cc:614
#4  0x00007ffff7bb1844 in hfst_ol::ZHfstOspeller::suggest (
this=0x7fffffffd840, wordform="olut") at ZHfstOspeller.cc:169
#5  0x00000000004036d5 in do_spell (speller=..., str="olut") at main.cc:164
#6  0x0000000000403aca in zhfst_spell (
zhfst_filename=0x7fffffffdf27 "speller_edit1.zhfst") at main.cc:281
#7  0x0000000000403f22 in main (argc=2, argv=0x7fffffffdb58) at main.cc:372

Reported by: flammie

hfst-importer commented 10 years ago

This was fixed in r3881.

Now that alphabets are translated even if the error source contains symbols not found in the lexicon, the lexicon needs to be prepared to be asked whether there are transitions with NO_SYMBOL. Now it just freaked out.

Original comment by: Traubert

hfst-importer commented 10 years ago

Original comment by: Traubert