grosjo / fts-xapian

Dovecot FTS plugin based on Xapian
GNU Lesser General Public License v2.1
91 stars 19 forks source link

[1.7.14] Core Dumps #169

Open Philippe23 opened 1 month ago

Philippe23 commented 1 month ago

I'm seeing some core dumps from 1.7.14.

#0  0x00007f4272d8c981 in ?? () from /lib64/libc.so.6
#1  0x00007f4270552e07 in XNGram::add_stem(icu::UnicodeString*) () from /usr/lib64/dovecot/lib21_fts_xapian_plugin.so
#2  0x00007f4270568e4a in XNGram::add(icu::UnicodeString*) () from /usr/lib64/dovecot/lib21_fts_xapian_plugin.so
#3  0x00007f4270568cf6 in XNGram::add(icu::UnicodeString*) () from /usr/lib64/dovecot/lib21_fts_xapian_plugin.so
#4  0x00007f4270568ceb in XNGram::add(icu::UnicodeString*) () from /usr/lib64/dovecot/lib21_fts_xapian_plugin.so
#5  0x00007f4270568ceb in XNGram::add(icu::UnicodeString*) () from /usr/lib64/dovecot/lib21_fts_xapian_plugin.so
     [...]
#68 0x00007f4270568ceb in XNGram::add(icu::UnicodeString*) () from /usr/lib64/dovecot/lib21_fts_xapian_plugin.so
#69 0x00007f4270568ceb in XNGram::add(icu::UnicodeString*) () from /usr/lib64/dovecot/lib21_fts_xapian_plugin.so
#70 0x00007f4270569325 in XDoc::populate_stems(long, char const*) () from /usr/lib64/dovecot/lib21_fts_xapian_plugin.so
#71 0x00007f4270569d2b in XDocsWriter::worker() () from /usr/lib64/dovecot/lib21_fts_xapian_plugin.so
#72 0x00007f42724c46a3 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#73 0x00007f4272cc2652 in ?? () from /lib64/libc.so.6
#74 0x00007f4272d3e8ac in ?? () from /lib64/libc.so.6

Logs that reference the PID from the core:

Jul 27 16:03:38 myhostname dovecot[3473]: indexer-worker(USER1)<10528><kIsuDj4eX+omTRZ9:EN/bB5pSpWYgKQAA0J78UA>: FTS Xapian: Waiting for all pending documents to be processed (Sleep5)
Jul 27 16:03:38 myhostname dovecot[3473]: indexer-worker(USER1)<10528><kIsuDj4eX+omTRZ9:EN/bB5pSpWYgKQAA0J78UA>: FTS Xapian: Waiting for all pending documents to be processed (Sleep5)
Jul 27 16:03:38 myhostname dovecot[3473]: indexer-worker(USER1)<10528><kIsuDj4eX+omTRZ9:EN/bB5pSpWYgKQAA0J78UA>: FTS Xapian: Waiting for all pending documents to be processed (Sleep5)
Jul 27 16:08:26 myhostname xapian-docswriter[10528]: DW #260 (Holding.Order Information,/home/USER1/.maildir/xapian-indexes/db_23392717807b795b6d200000d09efc50) - Can't add document2 : std::exception
Jul 27 16:08:26 myhostname xapian-docswriter[10528]: DW #260 (Holding.Order Information,/home/USER1/.maildir/xapian-indexes/db_23392717807b795b6d200000d09efc50) -  Retrying (/home/USER1/.maildir/xapian-indexes/db_23392717807b795b6d200000d09efc50) from std::exception
Jul 27 16:08:27 myhostname xapian-docswriter[10528]: DW #260 (Holding.Order Information,/home/USER1/.maildir/xapian-indexes/db_23392717807b795b6d200000d09efc50) -  Retrying (/home/USER1/.maildir/xapian-indexes/db_23392717807b795b6d200000d09efc50) from
Jul 27 16:12:22 myhostname dovecot[3473]: indexer-worker(USER1)<10528><kIsuDj4eX+omTRZ9:YL6JG2BTpWYgKQAA0J78UA>: Fatal: master: service(indexer-worker): child 10528 killed with signal 11 (core dumped)

A different trace:

Program terminated with signal SIGABRT, Aborted.
#0  0x00007fd44cd2935c in ?? () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7fd44950c6c0 (LWP 6456))]
(gdb) bt
#0  0x00007fd44cd2935c in ?? () from /lib64/libc.so.6
#1  0x00007fd44ccd91f6 in raise () from /lib64/libc.so.6
#2  0x00007fd44ccc18f7 in abort () from /lib64/libc.so.6
#3  0x00007fd44c4e5f22 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#4  0x00007fd44c4f897c in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#5  0x00007fd44c4f89e7 in std::terminate() () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#6  0x00007fd44c4f8c47 in __cxa_throw () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#7  0x00007fd44c4e59cc in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#8  0x00007fd44a3a1985 in ?? () from /usr/lib64/libxapian.so.30
#9  0x00007fd44a39fdf8 in Xapian::Document::Internal::add_term(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int) () from /usr/lib64/libxapian.so.30
#10 0x00007fd44a5baf75 in XDoc::create_document(long, char const*) () from /usr/lib64/dovecot/lib21_fts_xapian_plugin.so
#11 0x00007fd44a5cedb9 in XDocsWriter::worker() () from /usr/lib64/dovecot/lib21_fts_xapian_plugin.so
#12 0x00007fd44c5296a3 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#13 0x00007fd44cd27652 in ?? () from /lib64/libc.so.6
#14 0x00007fd44cda38ac in ?? () from /lib64/libc.so.6

Logs:

Jul 27 15:40:52 myhostname dovecot[3473]: indexer-worker(USER1)<5462><kIsuDj4eX+omTRZ9:ujDOBEFNpWZWFQAA0J78UA>: FTS Xapian: Waiting for all pending documents to be processed (Sleep5)
Jul 27 15:40:52 myhostname dovecot[3473]: indexer-worker(USER1)<5462><kIsuDj4eX+omTRZ9:ujDOBEFNpWZWFQAA0J78UA>: FTS Xapian: Waiting for all pending documents to be processed (Sleep5)
Jul 27 15:45:05 myhostname dovecot[3473]: indexer-worker(USER1)<5462><kIsuDj4eX+omTRZ9:GA1eHYVNpWZWFQAA0J78UA>: Fatal: master: service(indexer-worker): child 5462 killed with signal 6 (core dumped)

Another one:

Core was generated by `/usr/libexec/dovecot/indexer-worker'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fda4d056385 in ?? () from /usr/lib64/libxapian.so.30
[Current thread is 1 (Thread 0x7fda4ba476c0 (LWP 5288))]
(gdb) bt
#0  0x00007fda4d056385 in ?? () from /usr/lib64/libxapian.so.30
#1  0x00007fda4d052b86 in ?? () from /usr/lib64/libxapian.so.30
#2  0x00007fda4d048124 in ?? () from /usr/lib64/libxapian.so.30
#3  0x00007fda4d03f340 in ?? () from /usr/lib64/libxapian.so.30
#4  0x00007fda4d02fe6f in ?? () from /usr/lib64/libxapian.so.30
#5  0x00007fda4d030a75 in ?? () from /usr/lib64/libxapian.so.30
#6  0x00007fda4cf88f61 in Xapian::WritableDatabase::commit() () from /usr/lib64/libxapian.so.30
#7  0x00007fda4d1bd0aa in XDocsWriter::worker() () from /usr/lib64/dovecot/lib21_fts_xapian_plugin.so
#8  0x00007fda4f1176a3 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#9  0x00007fda4f915652 in ?? () from /lib64/libc.so.6
#10 0x00007fda4f9918ac in ?? () from /lib64/libc.so.6

Logs:

Jul 27 15:40:05 myhostname xapian-docswriter[4727]: DW #260 (Holding.Order Information,/home/USER1/.maildir/xapian-indexes/db_23392717807b795b6d200000d09efc50) - Can't add document2 : std::exception
Jul 27 15:40:05 myhostname xapian-docswriter[4727]: DW #260 (Holding.Order Information,/home/USER1/.maildir/xapian-indexes/db_23392717807b795b6d200000d09efc50) -  Retrying (/home/USER1/.maildir/xapian-indexes/db_23392717807b795b6d200000d09efc50) from std::exception
Jul 27 15:40:05 myhostname kernel: indexer-worker[5288]: segfault at 9 ip 00007fda4d056385 sp 00007fda4ba46ad0 error 4 in libxapian.so.30.12.6[7fda4cf41000+1ba000] likely on CPU 1 (core 0, socket 0)
Jul 27 15:40:05 myhostname kernel: Code: b7 d2 29 d0 48 83 c4 08 5b 5d 41 5c 41 5d c3 66 0f 1f 44 00 00 f3 0f 1e fa 41 57 41 56 41 55 41 54 49 89 fc 55 53 48 83 ec 28 <0f> b7 47 09 48 89 74 24 10 66 c1 c0 08 0f b7 c0 89 44 24 04 83 fa
Jul 27 15:40:48 myhostname dovecot[3473]: indexer-worker(USER1)<4727><kIsuDj4eX+omTRZ9:Bmx2Ir5MpWZ3EgAA0J78UA>: Fatal: master: service(indexer-worker): child 4727 killed with signal 11 (core dumped)

I also saw this once in my logs, but didn't find a core for the specific process:

Jul 27 15:43:26 myhostname dovecot[3473]: indexer-worker: Error: terminate called after throwing an instance of 'std::bad_alloc'
Jul 27 15:43:26 myhostname dovecot[3473]: indexer-worker: Error:   what():  std::bad_alloc
Philippe23 commented 1 month ago

Unfortunately, I didn't have debug symbols being installed, so I just have the callstacks. I think I've installed debug symbols for the program at this point, so if/when it happens again, I may have some more info.

Philippe23 commented 1 month ago

Callstack:

Core was generated by `/usr/libexec/dovecot/indexer-worker'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007ffb1aa8735c in ?? () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7ffb17e006c0 (LWP 12050))]
(gdb) bt
#0  0x00007ffb1aa8735c in ?? () from /lib64/libc.so.6
#1  0x00007ffb1aa371f6 in raise () from /lib64/libc.so.6
#2  0x00007ffb1aa1f8f7 in abort () from /lib64/libc.so.6
#3  0x00007ffb1a243f22 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#4  0x00007ffb1a25697c in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#5  0x00007ffb1a2569e7 in std::terminate() () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#6  0x00007ffb1a256c47 in __cxa_throw () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#7  0x00007ffb1a2439cc in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#8  0x00007ffb180ff985 in ?? () from /usr/lib64/libxapian.so.30
#9  0x00007ffb180fddf8 in Xapian::Document::Internal::add_term(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int) () from /usr/lib64/libxapian.so.30
#10 0x00007ffb18318f75 in XDoc::create_document (this=this@entry=0x555c4fdf9010, verbose=0,
    title=0x555c4fc2fb60 "DW #267 (Holding.Order Information,/home/USER1/.maildir/xapian-indexes/db_23392717807b795b6d200000d09efc50) - ") at /usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/basic_string.tcc:242
#11 0x00007ffb1832cdc9 in XDocsWriter::worker (this=0x555c4fc3d130)
    at /usr/src/debug/net-mail/dovecot-fts-xapian-1.7.14/fts-xapian-1.7.14/src/fts-backend-xapian-functions.cpp:843
#12 0x00007ffb1a2876a3 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#13 0x00007ffb1aa85652 in ?? () from /lib64/libc.so.6
#14 0x00007ffb1ab018ac in ?? () from /lib64/libc.so.6
#11 0x00007ffb1832cdc9 in XDocsWriter::worker (this=0x555c4fc3d130)
    at /usr/src/debug/net-mail/dovecot-fts-xapian-1.7.14/fts-xapian-1.7.14/src/fts-backend-xapian-functions.cpp:843
843                             doc->create_document(verbose,title);
(gdb) info locals
start_time = 1722164636509
n = 3
err = false
i = 2
newdoc = 0
doc = 0x555c4fdf9010

(gdb) print *this
$9 = {dbpath = 0x555c4fbd5a50 "/home/USER1/.maildir/xapian-indexes/db_23392717807b795b6d200000d09efc50",
  docs = 0x555c4fc1ed00, m = 0x555c4fbd0690, terminated = false, dbw = 0x555c4fb43280, verbose = 0,
  batch = 0x555c4fbd06c8, totaldocs = 0x555c4fbd06c0, t = 0x555c4fc1de40,
  title = 0x555c4fc2fb60 "DW #267 (Holding.Order Information,/home/USER1/.maildir/xapian-indexes/db_23392717807b795b6d200000d09efc50) - ", pos = 11}

(gdb) print *doc
$1 = {data = 0x7ffb17382010, strings = 0x555c4fdce990, headers = 0x555c4fb2a250, uid = 11964, size = 15,
  stems = 50001, tsize = 9878, uterm = 0x555c4fbc1bb0 "Q11964", xdoc = 0x7ffadae9c7b0}

strings & headers are both empty std::vectors.

(gdb) list
838                             if(verbose>0) syslog(LOG_INFO,"%sProcessing #%ld (%ld/%ld) %s",title,doc->uid,i+1,n,doc->getSummary().c_str());
839                             pos=10;
840                             doc->populate_stems(verbose,title);
841                             if(verbose>0) syslog(LOG_INFO,"%sCreating doc #%ld (%ld/%ld) %s",title,doc->uid,i+1,n,doc->getSummary().c_str());
842                             pos=11;
843                             doc->create_document(verbose,title);
844                             if(verbose>0) syslog(LOG_INFO,"%sPushing Doc %ld (%ld/%ld) with %ld strings and %ld stems",title,doc->uid,i+1,n,doc->size,doc->stems);
845                             if(doc->stems > 0)
846                             {
847                                     pos=12;

Logs:

Jul 28 06:48:27 myhostname dovecot[2176]: indexer-worker: Error: cannot allocate memory for thread-local data: ABORT
Jul 28 06:48:27 myhostname dovecot[2176]: indexer-worker(USER1)<8623><84I/VkwedsMmTRZ9:INRmDpohpmavIQAA0J78UA>: Fatal: master: service(indexer-worker): child 8623 returned error 127
Jul 28 06:48:30 myhostname dovecot[2176]: indexer-worker(USER1)<8886><84I/VkwedsMmTRZ9:aD3YLPshpma2IgAA0J78UA>: FTS Xapian: Waiting for all pending documents to be processed (Sleep5)
Jul 28 07:05:11 myhostname dovecot[2176]: indexer-worker(USER1)<11426><84I/VkwedsMmTRZ9:mFZODjolpmaiLAAA0J78UA>: Fatal: master: service(indexer-worker): child 11426 killed with signal 6 (core dumped)

Footnote: There seem to be a few different PIDs. Not sure if that's expected, or if I'm grabbing a few issues, but only one's creating a core. (?)

I'm realizing I don't have Xapian's debug symbols either. Gonna' rebuild that as well with detached debug symbols.

I should point out that I've got 11.5 gigs of swap free.

And I'm linking against Xapian v1.4.25.

Philippe23 commented 1 month ago
Core was generated by `/usr/libexec/dovecot/indexer-worker'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f3c0af2c35c in ?? () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7f3c0785b6c0 (LWP 27298))]
(gdb) bt
#0  0x00007f3c0af2c35c in ?? () from /lib64/libc.so.6
#1  0x00007f3c0aedc1f6 in raise () from /lib64/libc.so.6
#2  0x00007f3c0aec48f7 in abort () from /lib64/libc.so.6
#3  0x00007f3c0a6e8f22 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#4  0x00007f3c0a6fb97c in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#5  0x00007f3c0a6fb9e7 in std::terminate() () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#6  0x00007f3c0a6fbc47 in __cxa_throw () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#7  0x00007f3c0a6e89cc in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#8  0x00007f3c085a49a5 in std::__new_allocator<std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, OmDocumentTerm> > >::allocate (this=<optimized out>, __n=1)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/new_allocator.h:151
#9  std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, OmDocumentTerm> > > >::allocate (__n=1, __a=...)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/alloc_traits.h:482
#10 std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, OmDocumentTerm>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, OmDocumentTerm> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, OmDocumentTerm> > >::_M_get_node (this=0x7f3bd7bd8880) at /usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/stl_tree.h:563
#11 std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, OmDocumentTerm>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, OmDocumentTerm> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, OmDocumentTerm> > >::_M_create_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, OmDocumentTerm> > (this=0x7f3bd7bd8880) at /usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/stl_tree.h:613
#12 std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, OmDocumentTerm>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, OmDocumentTerm> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, OmDocumentTerm> > >::_Auto_node::_Auto_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, OmDocumentTerm> > (__t=..., this=<synthetic pointer>)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/stl_tree.h:1637
#13 std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, OmDocumentTerm>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, OmDocumentTerm> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, OmDocumentTerm> > >::_M_emplace_hint_unique<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, OmDocumentTerm> > (this=this@entry=0x7f3bd7bd8880, __pos=__pos@entry={...})
    at /usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/stl_tree.h:2462
#14 0x00007f3c085a2e18 in std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, OmDocumentTerm, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, OmDocumentTerm> > >::emplace_hint<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, OmDocumentTerm> > (__pos=..., this=0x7f3bd7bd8880)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/stl_map.h:640
#15 std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, OmDocumentTerm, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, OmDocumentTerm> > >::insert<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, OmDocumentTerm> > (__x=..., this=0x7f3bd7bd8880)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/stl_map.h:860
#16 Xapian::Document::Internal::add_term (this=0x7f3bd7bd8810, tname=..., wdfinc=<optimized out>) at api/omdocument.cc:502
#17 0x00007f3c085a2f3b in Xapian::Document::add_term (this=<optimized out>, tname=..., wdfinc=<optimized out>) at api/omdocument.cc:146
#18 0x00007f3c087bdf75 in XDoc::create_document (this=this@entry=0x5650a507bec0, verbose=0, title=0x5650a500c330 "DW #253 (Holding.Order Information,/home/USER1/.maildir/xapian-indexes/db_23392717807b795b6d200000d09efc50) - ")
    at /usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/basic_string.tcc:242
#19 0x00007f3c087d1dc9 in XDocsWriter::worker (this=0x5650a51c1b30) at /usr/src/debug/net-mail/dovecot-fts-xapian-1.7.14/fts-xapian-1.7.14/src/fts-backend-xapian-functions.cpp:843
#20 0x00007f3c0a72c6a3 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#21 0x00007f3c0af2a652 in ?? () from /lib64/libc.so.6
#22 0x00007f3c0afa68ac in ?? () from /lib64/libc.so.6
(gdb) up
#16 Xapian::Document::Internal::add_term (this=0x7f3bd7bd8810, tname=..., wdfinc=<optimized out>) at api/omdocument.cc:502
502             terms.insert(make_pair(tname, std::move(newterm)));
(gdb) list
497         map<string, OmDocumentTerm>::iterator i;
498         i = terms.find(tname);
499         if (i == terms.end()) {
500             ++termlist_size;
501             OmDocumentTerm newterm(wdfinc);
502             terms.insert(make_pair(tname, std::move(newterm)));
503         } else {
504             if (i->second.increase_wdf(wdfinc))
505                 ++termlist_size;
506         }

(gdb) print terms
$4 = std::map with 40230 elements = {["Q11923"]  **snip**
#8  0x00007f3c085a49a5 in std::__new_allocator<std::_Rb_tree_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, OmDocumentTerm> > >::allocate (this=<optimized out>, __n=1)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/new_allocator.h:151
151             return static_cast<_Tp*>(_GLIBCXX_OPERATOR_NEW(__n * sizeof(_Tp)));
(gdb) print __n
$12 = 1
(gdb) print sizeof(_Tp)
$13 = 96

So it's got a good size document going (40k of terms), but the allocation isn't huge, just 96 bytes.

grosjo commented 1 month ago

Weird. For your last comment : 96 bytes for 40k elements ? Meaning most, if not all, are empty ?

Philippe23 commented 1 month ago

96 bytes for 40k elements ? Meaning most, if not all, are empty ?

No, the 96 bytes appears to be just a node in a data structure (red-black tree). So it's a straw-that-broke-the-camel's-back situation.

My config has vsz_limit = 1G, so I'm guessing that dovecot proper enforces this memory cap, and we're hitting that cap. (Also, this is why having all that free swap at a system-level doesn't matter.)

Since I don't have any 1G messages, perhaps there's a memory leak?

I'll also try raising vsz_limit a little and see if it keeps happening.

But feel free to suggest other things to try to help track down what's going on.

Also, I'm not sure if there's a way for you to catch the allocation exception and recover from it.

grosjo commented 1 month ago

1G for 40k terms, this seems to much, as each term can not be more than than 245 chars

grosjo commented 1 month ago

I found a memory leaj. Can you try latest git ?

Philippe23 commented 1 month ago

Upgraded to latest, and returned to the 1G limit I had before. I'll keep an eye on it for a few days and let you know if I see it again or not.

Philippe23 commented 1 month ago

Not sure if it's related or a new crash, but I just got this twice:

Logs:

Aug  6 15:58:18 myhostname xapian-docswriter[23673]: DW #264 (Archived Trash.2024,/home/USER1/.maildir/xapian-indexes/db_c3bf2b05772d8066180f0000d09efc50) - Can't add document2 : std::exception
Aug  6 15:58:18 myhostname xapian-docswriter[23673]: DW #264 (Archived Trash.2024,/home/USER1/.maildir/xapian-indexes/db_c3bf2b05772d8066180f0000d09efc50) -  Retrying (/home/USER1/.maildir/xapian-indexes/db_c3bf2b05772d8066180f0000d09efc50) from std::exception
Aug  6 15:58:18 myhostname kernel: indexer-worker[23957]: segfault at 9 ip 00007fb8036b3d28 sp 00007fb8028f8ae0 error 4 in libxapian.so.30.12.6[7fb8035a4000+1ba000] likely on CPU 1 (core 0, socket 0)
Aug  6 15:58:18 myhostname kernel: Code: 5c e9 bc 12 ef ff 0f 1f 40 00 5b 5d 41 5c c3 90 66 90 f3 0f 1e fa 41 57 49 89 ff 41 56 41 55 41 54 49 89 f4 55 53 48 83 ec 18 <0f> b7 6f 09 48 89 4c 24 08 66 c1 c5 08 0f b7 ed 83 fa ff 0f 84 ff
Aug  6 15:58:59 myhostname dovecot[10243]: indexer-worker(USER1)<23673><yjsoLwkfA8EmTRZ9:a6d9Hw2AsmZ5XAAA0J78UA>: Fatal: master: service(indexer-worker): child 23673 killed with signal 11 (core dumped)

Backtrace:

Core was generated by `/usr/libexec/dovecot/indexer-worker'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fb8036b3d28 in do_bswap (value=<optimized out>) at ./common/wordaccess.h:47
47          return __builtin_bswap16(value);
[Current thread is 1 (Thread 0x7fb8028f96c0 (LWP 23957))]
(gdb) bt
#0  0x00007fb8036b3d28 in do_bswap (value=<optimized out>) at ./common/wordaccess.h:47
#1  do_unaligned_read<unsigned short> (ptr=0x9 <error: Cannot access memory at address 0x9>) at ./common/wordaccess.h:122
#2  unaligned_read2 (ptr=0x9 <error: Cannot access memory at address 0x9>) at ./common/wordaccess.h:165
#3  Glass::DIR_END (b=0x0) at backends/glass/glass_table.h:114
#4  GlassTable::find_in_leaf (p=0x0, item=..., c=11, exact=exact@entry=@0x7fb8028f8b37: false) at backends/glass/glass_table.cc:428
#5  0x00007fb8036b5d24 in GlassTable::find (this=this@entry=0x5645fa45c530, C_=C_@entry=0x5645fa45c5f8) at backends/glass/glass_table.h:175
#6  0x00007fb8036b5d85 in GlassTable::key_exists (this=this@entry=0x5645fa45c530, key=<incomplete sequence \340>) at backends/glass/glass_table.cc:1505
#7  0x00007fb8036ab274 in GlassPostListTable::merge_doclen_changes (this=this@entry=0x5645fa45c530, doclens=std::map with 766 elements = {...}) at backends/glass/glass_postlist.cc:1149
#8  0x00007fb8036a2470 in Inverter::flush_doclengths (table=..., this=0x5645fa45d130) at backends/glass/glass_inverter.cc:158
#9  Inverter::flush (this=this@entry=0x5645fa45d130, table=...) at backends/glass/glass_inverter.cc:218
#10 0x00007fb803692f9f in GlassWritableDatabase::flush_postlist_changes (this=0x5645fa45c160) at backends/glass/glass_database.cc:1064
#11 0x00007fb803693ba5 in GlassWritableDatabase::commit (this=0x5645fa45c160) at backends/glass/glass_database.cc:1043
#12 0x00007fb8035ebf81 in Xapian::WritableDatabase::commit (this=0x5645fa3e00c0) at api/omdatabase.cc:864
#13 0x00007fb80382005a in XDocsWriter::worker (this=0x5645fa513450) at /usr/src/debug/net-mail/dovecot-fts-xapian-9999/dovecot-fts-xapian-9999/src/fts-backend-xapian-functions.cpp:908
#14 0x00007fb80577a6a3 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#15 0x00007fb805f78652 in ?? () from /lib64/libc.so.6
#16 0x00007fb805ff48ac in ?? () from /lib64/libc.so.6

Code for frame #13 + data I can find from the locals:

(gdb) list
903                                     {
904                                             pos=16;
905                                             syslog(LOG_WARNING,"%s Retrying (%s) from %s",title,dbpath,err_s.c_str());
906                                             try
907                                             {
908                                                     dbw->commit();
909                                                     pos=17;
910                                                     dbw->replace_document(doc->uterm,*(doc->xdoc));
911                                                     (*totaldocs)++;
912                                             }
913                                             catch(Xapian::Error e)
914                                             {       
915                                                     syslog(LOG_ERR,"%sCan't add document3 : %s - %s %s",title,e.get_type(),e.get_msg().c_str(),e.get_error_string());
916                                             }
917                                             catch(std::exception e)
918                                             {       
919                                                     syslog(LOG_ERR,"%sCan't add document4 : %s",title,e.what());
920                                             }
921                                             pos=18;
922                                     }
(gdb) info locals
lck = {_M_device = 0x5645fa40f400, _M_owns = true}
err_s = "std::exception"
start_time = 1722974297079
n = 3
err = <optimized out>
i = <optimized out>
newdoc = 2
doc = 0x5645fa5a80f0
(gdb) print *doc
$1 = {data = 0x0, strings = 0x5645fa5e1900, headers = 0x5645fa46b910, uid = 769, size = 17, stems = 28517, tsize = 9819, uterm = 0x5645fa54fdb0 "Q769", xdoc = 0x5645fa51d520}
gdb) print *doc->strings
$2 = std::vector of length 0, capacity 32
(gdb) print *doc->headers
$3 = std::vector of length 0, capacity 32

Let me know if you want me to open a different issue for this one.

Oh, and if it's relevant, I'm linked against Xapian 1.4.25.

grosjo commented 1 month ago

This seems to be either another memory leak, or a low memory on your server.

How much memory yo allocated to the indexer ?

grosjo commented 1 month ago

Can you provide the few lines before "Can't add document2" ?

grosjo commented 1 month ago

please also use latest git

Philippe23 commented 1 month ago

Can you provide the few lines before "Can't add document2" ?

 867 #pragma GCC diagnostic push
 868 #pragma GCC diagnostic ignored "-Wdeprecated-declarations"
 869                         while(!(lck.try_lock_for(std::chrono::milliseconds(1000 + std::rand() % 1000))))
 870                         {
 871                                 if(verbose>1) syslog(LOG_INFO,"%sMutex : Waiting unlock... (thread)",title);
 872                         }
 873 #pragma GCC diagnostic pop
 874                         if(verbose>1) syslog(LOG_INFO,"%sMutex : Lock acquired (thread)",title);
 875                 std::string err_s; err_s.clear();
 876                                 try
 877                                 {
 878                     pos=13;
 879                                         dbw->replace_document(doc->uterm,*(doc->xdoc));
 880                     (*totaldocs)++;
 881                     (*batch)++;
 882                     if((*batch) > XAPIAN_WRITING_CACHE)
 883                     {
 884                         syslog(LOG_INFO,"%s Committing %ld docs (vs %ld limit)",title,(*batch),XAPIAN_WRITING_CACHE     );
 885                         pos=15;
 886                         dbw->commit();
 887                         *batch  = 0;
 888                     }
 889                                 }
 890                                 catch(Xapian::Error e)
 891                                 {                                                                                   892                                         syslog(LOG_WARNING,"%sCan't add document1 : %s - %s %s",title,e.get_type(),     e.get_msg().c_str(),e.get_error_string());
 893                                         err=true;
 894                     err_s.append(e.get_type());
 895                                 }
 896                                 catch(std::exception e)
 897                                 {
 898                                         syslog(LOG_WARNING,"%sCan't add document2 : %s",title,e.what());
 899                                         err=true;
 900                     err_s.append(e.what());
 901                                 }
 902                 if(err)
 903                 {
 904                     pos=16;
 905                     syslog(LOG_WARNING,"%s Retrying (%s) from %s",title,dbpath,err_s.c_str());
 906                     try
 907                     {
 908                         dbw->commit();
 909                         pos=17;
 910                                             dbw->replace_document(doc->uterm,*(doc->xdoc));
 911                                             (*totaldocs)++;
 912                     }
 913                     catch(Xapian::Error e)
 914                                     {                                                                               915                                             syslog(LOG_ERR,"%sCan't add document3 : %s - %s %s",title,e.get_type(),     e.get_msg().c_str(),e.get_error_string());
 916                                     }
 917                                     catch(std::exception e)
 918                                     {
 919                                             syslog(LOG_ERR,"%sCan't add document4 : %s",title,e.what());
 920                                     }
 921                     pos=18;
 922                 }
 923                         }
 924             pos=19;
 925             delete(doc);
 926                         newdoc++;
 927                 }                                                                                                   928                 if(verbose>0) syslog(LOG_INFO,"%sWrote %ld new docs in %ld ms",title,newdoc,fts_backend_xapian_curr     ent_time() - start_time);

The additional context you requested before I upgrade to latest.

grosjo commented 1 month ago

I meant, can you redo your test with the latest git ?

grosjo commented 3 weeks ago

Any crash ?

Philippe23 commented 3 weeks ago

I haven't had a crash since August 6th. I've been running the latest since August 13th. (August 13th is also the last time I've restarted my dovecot process.)

Philippe23 commented 3 weeks ago
Core was generated by `/usr/libexec/dovecot/indexer-worker'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007efe631dcd28 in do_bswap (value=<optimized out>)
    at ./common/wordaccess.h:47
47          return __builtin_bswap16(value);
[Current thread is 1 (Thread 0x7efe617116c0 (LWP 5949))]
(gdb) bt
#0  0x00007efe631dcd28 in do_bswap (value=<optimized out>) at ./common/wordaccess.h:47
#1  do_unaligned_read<unsigned short> (ptr=0x9 <error: Cannot access memory at address 0x9>) at ./common/wordaccess.h:122
#2  unaligned_read2 (ptr=0x9 <error: Cannot access memory at address 0x9>) at ./common/wordaccess.h:165
#3  Glass::DIR_END (b=0x0) at backends/glass/glass_table.h:114
#4  GlassTable::find_in_leaf (p=0x0, item=..., c=-1, exact=exact@entry=@0x7efe61710a87: false) at backends/glass/glass_table.cc:428
#5  0x00007efe631ded24 in GlassTable::find (this=0x7efe54046090, C_=0x7efe28bac598) at backends/glass/glass_table.h:175
#6  0x00007efe631b7488 in GlassCursor::find_entry (this=0x7efe2908e230, key="Q4238") at backends/glass/glass_cursor.cc:164
#7  0x00007efe631d0c1d in GlassPostList::init (this=this@entry=0x7efe5c1777a0) at /usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/unique_ptr.h:199
#8  0x00007efe631d1064 in GlassPostList::GlassPostList (this=this@entry=0x7efe5c1777a0, this_db_=..., term_="Q4238", keep_reference=keep_reference@entry=true) at backends/glass/glass_postlist.cc:708
#9  0x00007efe631ba827 in GlassWritableDatabase::open_post_list (this=0x7efe54045cc0, tname="Q4238") at ./include/xapian/intrusive_ptr.h:155
#10 0x00007efe6314f7ad in Xapian::Database::Internal::replace_document (this=0x7efe54045cc0, unique_term=..., document=...) at backends/database.cc:197
#11 0x00007efe63348f6a in XDocsWriter::worker (this=0x7efe5c0f0340) at /usr/src/debug/net-mail/dovecot-fts-xapian-9999/dovecot-fts-xapian-9999/src/fts-backend-xapian-functions.cpp:1025
#12 0x00007efe652a36a3 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/13/libstdc++.so.6
#13 0x00007efe65aa1652 in ?? () from /lib64/libc.so.6
#14 0x00007efe65b1d8ac in ?? () from /lib64/libc.so.6
Aug 24 07:31:55 myhostname dovecot[4322]: indexer-worker(USER7)<5932><Er+FJaXEyWYoFwAA0J78UA:6JAaNaXEyWYsFwAA0J78UA>: FTS Xapian: Waiting for all pending documents (1) to be processed (Sleep5) with 2 threads
Aug 24 07:34:12 myhostname xapian-docswriter[5932]: DW #2 (Sent,/home/USER1/.maildir/xapian-indexes/db_90751f2b357a795b611b0000d09efc50) - Can't write doc2 : std::exception
Aug 24 07:34:12 myhostname xapian-docswriter[5932]: FTS Xapian: Can not allocate ICU translator (2)
Aug 24 07:34:12 myhostname kernel: indexer-worker[5949]: segfault at 9 ip 00007efe631dcd28 sp 00007efe61710a30 error 4 in libxapian.so.30.12.6[7efe630cd000+1ba000] likely on CPU 1 (core 0, socket 0)
Aug 24 07:34:12 myhostname kernel: Code: 5c e9 bc 12 ef ff 0f 1f 40 00 5b 5d 41 5c c3 90 66 90 f3 0f 1e fa 41 57 49 89 ff 41 56 41 55 41 54 49 89 f4 55 53 48 83 ec 18 <0f> b7 6f 09 48 89 4c 24 08 66 c1 c5 08 0f b7 ed 83 fa ff 0f 84 ff
Aug 24 07:34:51 myhostname dovecot[4322]: indexer-worker(USER1)<5932><6Lv6JGwgWs8mTRZ9:0D63G7vEyWYsFwAA0J78UA>: Fatal: master: service(indexer-worker): child 5932 killed with signal 11 (core dumped)

I suspect this one is an upstream bug.

grosjo commented 2 weeks ago

This looks like a memory leak

Philippe23 commented 1 week ago

Could be -- certainly happened when memory was low, but it doesn't look like you're directly causing the crash from what I see. All the pointers you're passing in look like they're non-NULL from what I see. So my gut says that it's upstream not handling an allocation failure properly. I guess, "why not both?" could apply here.

grosjo commented 5 days ago

Kindly set verbose=1 and use latest git

Philippe23 commented 4 days ago

Latest has a ton of warnings like this:

fts-backend-xapian-functions.cpp:951:83: warning: format not a string literal and no format arguments [-Wformat-security]
  951 |           if(verbose>0) { s=title; s.append("No-op"); syslog(LOG_INFO,s.c_str()); }
      |                                                       ~~~~~~^~~~~~~~~~~~~~~~~~~~

You should do these like this 👇 if you're just trying to output a code-built string:

syslog(LOG_INFO, "%s", s.c_str());
//               ^^^^

Otherwise, you could have a % format character in the string you're trying to output and the library will assume there's an arg passed that contains what it should read (but there isn't).

With the "%s" format string you're saying, output this string arg as-is.

Philippe23 commented 2 days ago

Made a PR to fix 'em for you: https://github.com/grosjo/fts-xapian/pull/171