Closed nickalcock closed 3 years ago
Noted. Please kindly try again with latest git
Yep, caching the transliterator works much better and the crash is gone! It's about 50% faster at bulk indexes of mostly-smallish emails now, too, even with the decode2text snail script slowing everything down, and even given that spinning rust is involved. I suspect a profile would have shown about 90% of the thing's time was being spent creating transliterators, or that creating transliterators slows as O(n^2) as more are created, or something like that :)
Also, holy crap is this thing a lot faster than lucene. Like, instantaneous. Well worth paying the 50% index size increase...
This is with the tip of the Xapian RELEASE/1.4 branch, with what seems to be the recommended configuration, or close to it, upon doing an initial index of my INBOX:
plugin { fts = xapian fts_xapian = partial=3 full=20 verbose=0 fts_enforced = body fts_languages = en fts_language_config = /usr/share/libexttextcat/dovecot.conf fts_decoder = decode2text } service decode2text { executable = script /usr/libexec/dovecot/decode2text.sh user = dovecot unix_listener decode2text { mode = 0666 } }
Backtrace:
0 XNGram::add (this=this@entry=0x558e1a1b9c20, d=d@entry=0x7fff4f7cfce0)
1 0x00007f8a9f48c495 in XNGram::add (s=0x558d9e4ba614 "insanely", this=0x558e1a1b9c20)
2 fts_backend_xapian_index_text (backend=, uid=, field=, data=)
3 0x00007f8a9f48d875 in fts_backend_xapian_update_build_more (_ctx=0x558d33540570, data=, size=)
4 0x00007f8aa2831d70 in fts_build_full_words (last=false, size=,
5 fts_build_data (ctx=0x7fff4f7d0020, data=, size=, last=)
6 0x00007f8aa2832577 in fts_build_body_block (last=false, block=0x7fff4f7cffb0, ctx=0x7fff4f7d0020)
7 fts_build_mail_real (may_need_retry_r=0x7fff4f7cff53, retriable_err_msg_r=0x7fff4f7cff60, mail=0x7fff4f7cff90, update_ctx=0x558d33540570)
8 fts_build_mail (update_ctx=0x558d33540570, mail=mail@entry=0x558d3353fb78)
9 0x00007f8aa2837f3e in fts_mail_index (_mail=0x558d3353fb78) at /usr/src/dovecot/x86_64-loom/src/plugins/fts/fts-storage.c:550
10 fts_mail_precache (_mail=0x558d3353fb78) at /usr/src/dovecot/x86_64-loom/src/plugins/fts/fts-storage.c:571
11 0x00007f8aa33213ce in mail_precache (mail=0x558d3353fb78) at /usr/src/dovecot/x86_64-loom/src/lib-storage/mail.c:453
12 0x0000558d32e9ba1a in master_connection_input ()
13 0x00007f8aa315c0b8 in io_loop_call_io (io=0x558d334aa2f0) at /usr/src/dovecot/x86_64-loom/src/lib/ioloop.c:714
14 0x00007f8aa315d722 in io_loop_handler_run_internal (ioloop=ioloop@entry=0x558d3349e250)
15 0x00007f8aa315c161 in io_loop_handler_run (ioloop=0x558d3349e250) at /usr/src/dovecot/x86_64-loom/src/lib/ioloop.c:766
16 0x00007f8aa315c320 in io_loop_run (ioloop=0x558d3349e250) at /usr/src/dovecot/x86_64-loom/src/lib/ioloop.c:739
17 0x00007f8aa30d1623 in master_service_run (service=0x558d3349e0b0, callback=)
18 0x0000558d32e9b3cd in main ()
(gdb) print accentsConverter $1 = (icu_69::Transliterator *) 0x0 (gdb) print status $3 = U_INVALID_ID
At the very least you should be checking to see whether the transliterator could be created.
This might very well be a bug in ICU, but I do wonder why it would have succeeded probably thousands of times and only now failed. Perhaps calling createInstance this often (rather than createInstance once and getInstance in future) causes some sort of resource exhaustion?