tvondra / shared_ispell

Shared ispell dictionary (stored in shared segment, used by multiple connections)
Other
11 stars 4 forks source link

Problems #1

Closed luben closed 12 years ago

luben commented 12 years ago

Great work,

I have tried to use it but I have some problems with bulgarian dictionaries from hunspell - they are working with "ispell" template but do not work with "shared_ispell". Here is a transcript from a session that shows the problem:

psql91 (9.1.2) Type "help" for help.

-- create normal ispell dict

db=> DROP TEXT SEARCH DICTIONARY IF EXISTS bulgarian_ispell; DROP TEXT SEARCH DICTIONARY Time: 1,540 ms db=> db=> CREATE TEXT SEARCH DICTIONARY bulgarian_ispell ( db(> TEMPLATE = ispell, db(> DictFile = bg_bg, db(> AffFile = bg_bg, db(> StopWords= bulgarian db(> ); CREATE TEXT SEARCH DICTIONARY Time: 438,533 ms

-- shared ispell dictionary

db=> DROP TEXT SEARCH DICTIONARY IF EXISTS bulgarian_ispell_shared; NOTICE: text search dictionary "bulgarian_ispell_shared" does not exist, skipping DROP TEXT SEARCH DICTIONARY Time: 1,577 ms db=> db=> CREATE TEXT SEARCH DICTIONARY bulgarian_ispell_shared ( db(> TEMPLATE = shared_ispell, db(> DictFile = bg_bg, db(> AffFile = bg_bg, db(> StopWords= bulgarian db(> ); CREATE TEXT SEARCH DICTIONARY Time: 1,908 ms db=> commit; COMMIT Time: 1,372 ms db=> select shared_ispell_reset();

shared_ispell_reset

(1 row)

Time: 124,997 ms

-- tests

db=> SELECT ts_lexize('bulgarian_ispell', 'КНИГИ');

ts_lexize

{книга} (1 row)

Time: 511,633 ms db=> SELECT ts_lexize('bulgarian_ispell_shared', 'КНИГИ');

ts_lexize

(1 row)

Time: 457,093 ms db=> select shared_ispell_mem_used();

shared_ispell_mem_used

            9130256

(1 row)

-- end

I have tried with russian dictionaries and they work fine, so it is not the cyrillic alphabet to blame.

In postgresql.conf, I have these GUCs related to ispell:

shared_preload_libraries = 'shared_ispell' # (change requires restart) custom_variable_classes = 'shared_ispell' # list of custom variable class names shared_ispell.max_size = 209715200 # 200MB

How to debug the problem? I could send you the dictionaries to try yourself.

Thanks in advance luben

tvondra commented 12 years ago

Thanks for reporting the issue! Yes, if you can provide the dictionaries (or a link), that' would be great. I've found some bulgarian dictionaries at http://lasr.cs.ucla.edu/geoff/ispell-dictionaries.html but it'd be nice to have the same version.

tvondra commented 12 years ago

I think I've found the bug - one of the affix fields was not copied properly, so the behavior was quite random. I've tested it with the bulgarian dictionaries from the ucla.edu site and the shared dictionary now returns {книга} just like the plain ispell dictionary. Can you check it works for you?

luben commented 12 years ago

It it works now without problems Thanks a lot