pazz / alot

Terminal-based Mail User Agent
GNU General Public License v3.0
703 stars 163 forks source link

Alot aborted after tagging three times. #1460

Open hsanson opened 4 years ago

hsanson commented 4 years ago

Describe the bug In search buffer if I run this command three times, alot aborts:

untag --no-flush inbox; untag --no-flush spam; tag killed

Software Versions

To Reproduce Steps to reproduce the behaviour:

  1. open search buffer
  2. run the command :untag --no-flush inbox; untag --no-flush spam; tag killed three times
  3. Alot closes and an "Aborted" message appears in the terminal.

Error Log Log output below but there does not seem to be any errors in it.

DEBUG:__init__:cmd parms {'flush': False, 'allmessages': False, 'tags': 'spam', 'action': 'remove'}
DEBUG:search:all? False
DEBUG:search:q: (tag:inbox AND NOT tag:killed AND NOT tag:spam AND NOT tag:replied) AND thread:0000000000037292
DEBUG:ui:search command string: " tag killed"
DEBUG:__init__:mode:search got commandline " tag killed"
DEBUG:__init__:ARGS: ['tag', 'killed']
DEBUG:__init__:cmd parms {'flush': 'True', 'allmessages': False, 'tags': 'killed', 'action': 'add'}
DEBUG:search:all? False
DEBUG:search:q: (tag:inbox AND NOT tag:killed AND NOT tag:spam AND NOT tag:replied) AND thread:0000000000037292
DEBUG:manager:write-out item: ('untag', None, '(tag:inbox AND NOT tag:killed AND NOT tag:spam AND NOT tag:replied) AND thread:0000000000037292', ['inbox'])
DEBUG:manager:cmd created
DEBUG:manager:got write lock
DEBUG:manager:got atomic
DEBUG:manager:ended atomic
DEBUG:manager:ended atomic
DEBUG:manager:closed db
DEBUG:manager:flush finished
DEBUG:manager:write-out item: ('untag', None, '(tag:inbox AND NOT tag:killed AND NOT tag:spam AND NOT tag:replied) AND thread:0000000000037292', ['spam'])
DEBUG:manager:cmd created
DEBUG:manager:got write lock
DEBUG:manager:got atomic
DEBUG:manager:ended atomic
DEBUG:manager:ended atomic
DEBUG:manager:closed db
DEBUG:manager:flush finished
DEBUG:manager:write-out item: ('tag', None, '(tag:inbox AND NOT tag:killed AND NOT tag:spam AND NOT tag:replied) AND thread:0000000000037292', ['killed'])
DEBUG:manager:cmd created
DEBUG:manager:got write lock
DEBUG:manager:got atomic
DEBUG:manager:ended atomic
DEBUG:manager:ended atomic
DEBUG:manager:closed db
DEBUG:manager:flush finished
DEBUG:search:remove thread from result list: thread:0000000000037292: PROBLEM: Clustering not runnig on dev-xmpp03
hsanson commented 4 years ago

After more testing just calling :tag killed three times in sequence crashes alot. This happens on any search I have and using any tag.

Most likely some version issues with Xapian database but I am unable to debug this. I tried using python3-gdb but I get no symbols or backtrace to see what is going on. Any tips on how to debug this would be appreciated.

 [~]>  gdb /usr/bin/python3.6-dbg core
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/python3.6-dbg...done.

warning: core file may not match specified executable file.
[New LWP 17123]
Core was generated by `/usr/bin/python3 /home/ryujin/.local/bin/alot'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f76cf7dbe97 in ?? ()
(gdb) py-bt
Unable to locate python frame
(gdb) bt
#0  0x00007f76cf7dbe97 in ?? ()
#1  0x0000000000000000 in ?? ()
(gdb)
hsanson commented 4 years ago

After more investigation by setting debug messages all over the place I found the crash happens in this line: alot/ui.py#L655.

After setting a tag in a search buffer that causes that thread to be removed from the buffer, alot goes like this:

  1. Call TagCommand apply() method.
  2. After applying the tags it invokes FlushCommand with an inner refresh() function as callback.
  3. Since the message tagged no longer matches the search query, the refresh() method removes it by invoking searchbuffer.threadlist.remove() on the thread.
  4. The refresh() callback finishes by calling ui.update() that crashes (aborts) when it reaches draw_screen() in the Mainloop object.

If the thread is not removed from the seach buffer then the crash does not occur. This can be tested by commenting out the searchbuffer.threadlist.remove() call.

This is a urwid MainLoop that is outside of my debugging skills. This same program path works fine twice but the third time always aborts. Most probably some race condition or similar going on inside that MainLoop due to the async/await being used.

For reference I tried this with urwid 2.0.1 and 2.1.0 versions and it crashes with both.

pazz commented 4 years ago

Thanks a lot for the debugging effort! It sounds like a problem that I had as well but I was not able to reproduce it reliably. I have the impression that it has something to do with the underlying notmuch library being called in an unsafe way, because no python library would simply segfault in this way without any error logs..

I'll try to look into it over the holiday break. Cheers, P

Quoting Horacio Sanson (2019-12-12 14:04:24)

After more investigation by setting debug messages all over the place I found the crash happens in this line: alot/ui.py#L655.

After setting a tag in a search buffer that causes that thread to be removed from the buffer, alot goes like this:

  1. Call TagCommand apply() method.
  2. After applying the tags it invokes FlushCommand with an inner refresh() function as callback.
  3. Since the message tagged no longer matches the search query, the refresh() method removes it by invoking searchbuffer.threadlist.remove() on the thread.
  4. The refresh() callback finishes by calling ui.update() that crashes (aborts) when it reaches draw_screen() in the Mainloop object.

If the thread is not removed from the seach buffer then the crash does not occur. This can be tested by commenting out the searchbuffer.threadlist.remove () call.

This is a urwid MainLoop that is outside of my debugging skills. This same program path works fine twice but the third time always aborts. Most probably some race condition or similar going on inside that MainLoop dot to the async/ await being used.

For reference I tried this with urwid 2.0.1 and 2.1.0 versions and it crashes with both.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.*

elenril commented 4 years ago

I can reproduce this as well, bisecting points at commit e7e0c52db9093a9ecd9dcaa0766e66515a546a75 as the culprit.

lenormf commented 4 years ago

I keep hitting the problem as well, v0.9.1. It's hard to reproduce reliably, especially when debugging is enabled, but I always hit the issue when toggling a tag (s).

The last seemingly relevant lines in the log are:

DEBUG:globals:flush complete
DEBUG:ui:Got key (['s'], [115])
DEBUG:ui:cmdline: 'toggletags unread'
DEBUG:ui:search command string: "toggletags unread"
DEBUG:__init__:mode:search got commandline "toggletags unread"
DEBUG:__init__:ARGS: ['toggletags', 'unread']
DEBUG:__init__:cmd parms {'flush': 'True', 'tags': 'unread', 'action': 'toggle'}
DEBUG:search:all? False
DEBUG:search:q: (tag:unread) AND thread:0000000000002773
DEBUG:manager:write-out item: ('untag', <function Thread.remove_tags.<locals>.myafterwards at 0x7fa4d15855e0>, 'thread:0000000000002773', ['unread'])
DEBUG:manager:cmd created
DEBUG:manager:got write lock
DEBUG:manager:got atomic
DEBUG:manager:ended atomic
DEBUG:manager:ended atomic
DEBUG:manager:closed db
DEBUG:manager:<function Thread.remove_tags.<locals>.myafterwards at 0x7fa4d15855e0>
DEBUG:manager:called callback
DEBUG:manager:flush finished
DEBUG:manager:write-out item: ('tag', <function Thread.add_tags.<locals>.myafterwards at 0x7fa4d1585790>, 'thread:0000000000002773', [])
DEBUG:manager:cmd created
DEBUG:manager:got write lock
DEBUG:manager:got atomic
DEBUG:manager:ended atomic
DEBUG:manager:ended atomic
DEBUG:manager:closed db
DEBUG:manager:<function Thread.add_tags.<locals>.myafterwards at 0x7fa4d1585790>
DEBUG:search:remove thread from result list: thread:0000000000002773: [RSS][hacker-news][11/28] SIGGRAPH 2020 Technical Paper: N-Dimensional Rigid Body Dynamics

Let me know what I could do to narrow down the issue, or even if there's a hot fix to try out.

HTH.

pazz commented 4 years ago

So you also see an abrupt segfault of the underlying python interpreter, correct? I'd really like to know what causes this or even how to debug this. Again, I'm quite sure that it is down to the notmuch bindings...

elenril commented 4 years ago

Quoting Patrick Totzke (2020-05-08 21:36:39)

Again, I'm quite sure that it is down to the notmuch bindings...

Most likely. The memory safety of the python binding is highly suspect. In my local alot fork, this goes away by switching to notmuch2 bindings.

-- Anton Khirnov

lenormf commented 4 years ago

@elenril do you have a patch of your fix I could try, please?

elenril commented 4 years ago

Quoting Frank LENORMAND (2020-05-09 07:49:30)

@elenril do you have a patch of your fix I could try, please?

My branch has diverged from master quite considerably, so testing just this fix is not possible. Aalso, I am becoming less sure of my previous comment - this might be a problem in notmuch itself, not the bindings.

As far as I can tell, the crash is an unhandled Xapian::DatabaseModifiedError, triggered by the database being modified while a query is being read.

But you can simply revert e7e0c52.

-- Anton Khirnov

pazz commented 4 years ago

Could you today your findings too the notmuch mailing list, @lenormf ? if much rather fix the bindings that reverting to this arcane work around in separate processes.

lenormf commented 4 years ago

I will, if I have information not already reported here.

I see that users in this thread are involved in notmuch's CFFI bindings. What is the state of those, are they currently in a good-enough state that they could be used instead of the regular bindings?

Could it be a better move to have a new alot branch that uses notmuch2 bindings, which allegedly don't have the crash issue? We would only have to live with the crash for as long as it took to get notmuch2 up to par (as far as I understand, they would deprecate the old ones), and this could bolster improvements and more generally upstream contributions to notmuch2.

elenril commented 4 years ago

Quoting Frank LENORMAND (2020-05-09 10:48:33)

Could it be a better move to have a new alot branch that uses notmuch2 bindings, which allegedly don't have the crash issue? We would only have to live with the crash for as long as it took to get notmuch2 up to par (as far as I understand, they would deprecate the old ones), and this could bolster improvements and more generally upstream contributions to notmuch2.

I retract my comment from above - the problem seems NOT to be in the python bindings, so changing those would not fix anything. I can actually reproduce the same crash (at least I think it's the same crash) just with the notmuch tool in the following way:

The first crash will abort. IIUC it is caused by Xapian throwing the DatabaseModifiedError exception, which is not handled by libnotmuch. The solution is fixing libnotmuch to handle it. That might require API changes, since I think there is no way for a threads or messages iterator to report and error. The caller (i.e. alot in this case) would also have to re-execute the query.

-- Anton Khirnov

lenormf commented 4 years ago

I ran gdb to try to get a more precise stacktrace, this is what I got:

#0  0x00007ffff788ace5 in raise () at /usr/lib/libc.so.6
#1  0x00007ffff7874857 in abort () at /usr/lib/libc.so.6
#2  0x00007ffff534a0ff in  () at /usr/lib/libtalloc.so.2
#3  0x00007ffff534a0f7 in  () at /usr/lib/libtalloc.so.2
#4  0x00007ffff54615ab in notmuch_thread_destroy () at /usr/lib/libnotmuch.so.5
#5  0x00007ffff6183a8d in  () at /usr/lib/libffi.so.7
#6  0x00007ffff618301b in  () at /usr/lib/libffi.so.7
[Python interpreter calls…]
#184 0x00007ffff7cd3b38 in PyRun_SimpleFileExFlags (fp=0x5555555b7240, filename=<optimized out>, closeit=1, flags=0x7fffffffe498) at Python/pythonrun.c:428
#185 0x00007ffff7e3e7a8 in pymain_run_file (cf=0x7fffffffe498, config=0x55555557de50) at Modules/main.c:381
#186 pymain_run_python (exitcode=0x7fffffffe490) at Modules/main.c:565
#187 Py_RunMain () at Modules/main.c:644
#188 0x00007ffff7e3e899 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:698
#189 0x00007ffff7876023 in __libc_start_main () at /usr/lib/libc.so.6
#190 0x000055555555507e in _start

notmuch_thread_destroy(thread) simply calls talloc_free(thread), which raises SIGABRT (corrupted memory? double free? race?).

With @elenril's post, I think there's now enough information to close this issue and handle the crash on the notmuch side.

elenril commented 4 years ago

Quoting Frank LENORMAND (2020-05-09 12:45:09)

I ran gdb to try to get a more precise stacktrace, this is what I got:

#0  0x00007ffff788ace5 in raise () at /usr/lib/libc.so.6
#1  0x00007ffff7874857 in abort () at /usr/lib/libc.so.6
#2  0x00007ffff534a0ff in  () at /usr/lib/libtalloc.so.2
#3  0x00007ffff534a0f7 in  () at /usr/lib/libtalloc.so.2
#4  0x00007ffff54615ab in notmuch_thread_destroy () at /usr/lib/libnotmuch.so.5
#5  0x00007ffff6183a8d in  () at /usr/lib/libffi.so.7
#6  0x00007ffff618301b in  () at /usr/lib/libffi.so.7
[Python interpreter calls…]
#184 0x00007ffff7cd3b38 in PyRun_SimpleFileExFlags (fp=0x5555555b7240, filename=<optimized out>, closeit=1, flags=0x7fffffffe498) at Python/pythonrun.c:428
#185 0x00007ffff7e3e7a8 in pymain_run_file (cf=0x7fffffffe498, config=0x55555557de50) at Modules/main.c:381
#186 pymain_run_python (exitcode=0x7fffffffe490) at Modules/main.c:565
#187 Py_RunMain () at Modules/main.c:644
#188 0x00007ffff7e3e899 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:698
#189 0x00007ffff7876023 in __libc_start_main () at /usr/lib/libc.so.6
#190 0x000055555555507e in _start

notmuch_thread_destroy(thread) simply calls talloc_free(thread), which raises SIGABRT (corrupted memory? double free? race?).

With @elenril's post, I think there's now enough information to close this issue and handle the crash on the notmuch side.

That seems like a different problem though. The one I am seeing is triggered in xapian, whereas in your backtrace the abort() comes from libtalloc, which is used by libnotmuch itself. That actually could be caused by the bindings managing memory incorrectly.

-- Anton Khirnov

lenormf commented 4 years ago

Correct, but in any case this is a notmuch issue, and doesn't need to be handled here.

elenril commented 4 years ago

Quoting Frank LENORMAND (2020-05-09 13:14:41)

Correct, but in any case this is a notmuch issue, and doesn't need to be handled here.

Not necessarily. The current python bindings are quite finicky about being used in exactly the right manner, it could be that alot is not handling some detail correctly. I think that was one of the motivations for replacing them with notmuch2.

-- Anton Khirnov

lenormf commented 4 years ago

Fair enough.

lenormf commented 4 years ago

@elenril it doesn't look like the Xapian::DatabaseModifiedError abortion was reported upstream, have you notified them about it?

elenril commented 4 years ago

Quoting Frank LENORMAND (2020-05-18 12:05:13)

@elenril it doesn't look like the Xapian::DatabaseModifiedError abortion was reported upstream, have you notified them about it?

Yes, I have discussed it with developers on IRC and intend to send a patch fixing it.

-- Anton Khirnov

lenormf commented 4 years ago

Great to hear, thanks!

stlawrance commented 4 years ago

I've been running into the same problem, consistent crashes after a tag command causes a thread to be removed from a search buffer, requiring more threads to be loaded. (It doesn't happen if the complete search results are shown, so one workaround is to keep your inbox under control!)

Quoting @elenril:

That seems like a different problem though. The one I am seeing is triggered in xapian, whereas in your backtrace the abort() comes from libtalloc, which is used by libnotmuch itself. That actually could be caused by the bindings managing memory incorrectly.

For me it's also usually an abort() from libtalloc.

Quoting @lenormf:

notmuch_thread_destroy(thread) simply calls talloc_free(thread), which raises SIGABRT (corrupted memory? double free? race?).

It actually seems to be a use after free. If you set e.g. TALLOC_FREE_FILL=5, you'll get an out of bounds read a litte earlier:

#0  strlen (str=0x505050505050505 <Address 0x505050505050505 out of bounds>)
    at /usr/src/lib/libc/string/strlen.c:100
#1  0x000000080871e9d8 in z_get (ptr=0x7ffffffb53a0, size=8) at cfield.c:1346
#2  0x0000000808715078 in GetResult (restype=0x80691b440, result=0x7ffffffb53a0, checker=0x0)
    at callproc.c:921
#3  0x0000000808714758 in _ctypes_callproc (pProc=0x809bcd3a0 <notmuch_thread_get_thread_id>, 
    argtuple=0x80da90680, flags=4353, argtypes=0x807e3db50, restype=0x80691b440, checker=0x0)
    at callproc.c:1223

Which comes from the yield t.get_thread_id() in alot's DBManager,get_threads. I'm not sure where that thread object (or its parent threads or query) actually got freed -- it's not coming from the _destroy finalizer calls in the notmuch python bindings. I suspect it's after the db.close() in DBManager.flush. And Database.close does have the warning:

This function closes the notmuch database. From that point on every method invoked on any object ever derived from the closed database may cease to function and raise a
NotmuchError.

Quoting @elenril:

Yes, I have discussed it with developers on IRC and intend to send a patch fixing it.

Well, even if notmuch is fixed so as not to abort, it still needs to be addressed on the alot side, whether you wait for an error to come back from notmuch or proactively cancel any open queries after writing to the db. In any case it breaks the search buffer's lazy loading mechanism...

jonassmedegaard commented 4 years ago

Quoting Steven Lawrance (2020-06-18 20:30:47)

I've been running into the same problem, consistent crashes after a tag command causes a thread to be removed from a search buffer, requiring more threads to be loaded. (It doesn't happen if the complete search results are shown, so one workaround is to keep your inbox under control!)

In my experience it also does not happen in the last page: So if you have a list longer than one screenful, then an option is to work your way up from the bottom.

meskio commented 4 years ago

I'm also hitting this problem very frequently running alot from master. @elenril did you manage to write a patch for notmuch? do you need any help testing it or something?

stlawrance commented 3 years ago

Are people still hitting this? I just installed alot from FreeBSD ports and ran into it again, and remembered I had a small patch as a workaround:

diff --git a/alot/db/manager.py b/alot/db/manager.py
index d95ea332..713d8d73 100644
--- a/alot/db/manager.py
+++ b/alot/db/manager.py
@@ -277,8 +277,9 @@ class DBManager:
         if exclude_tags:
             for tag in exclude_tags:
                 q.exclude_tag(tag)
-        for t in q.search_threads():
-            yield t.get_thread_id()
+        thread_ids = [t.get_thread_id() for t in q.search_threads()]
+        for t in thread_ids:
+             yield t

     def query(self, querystring):
         """

This just pre-loads all the thread IDs from a search query, so the search itself can be closed. It does slow things down noticeably when there are a lot (10k+) of results but it's at least usable for me.

jonassmedegaard commented 3 years ago

Yes, I still suffer from this issue. Will try your patch - thanks!

jonassmedegaard commented 3 years ago

your patch works well for me, @stlawrance - Thanks!

pazz commented 3 years ago

This is magic, I don't see why this should work but hey :) Can you send a PR? Thanks! P

jonassmedegaard commented 3 years ago

I no longer experience this issue with release v0.10 (as packaged for Debian).

jonassmedegaard commented 3 years ago

no, I was was mistaken (forgot that failure occured when _changing a listing - e.g. deleting entry - 3 times, not simply scrolling 3 pages): Issue still exists for me with release v0.10 :-(

jonassmedegaard commented 3 years ago

This is an updated patch which (slows down but) avoids crashing when deleting an item 3 times in a list longer than 1 screen:

--- manager.py.orig     2021-09-26 19:40:37.000000000 +0200
+++ manager.py  2021-09-28 10:30:43.505410082 +0200
@@ -335,10 +335,11 @@
         """
         assert sort in self._sort_orders
         db = Database(path=self.path, mode=Database.MODE.READ_ONLY)
-        for t in db.threads(querystring,
+        thread_ids = [t.threadid for t in db.threads(querystring,
                             sort=self._sort_orders[sort],
-                            exclude_tags=self.exclude_tags):
-            yield t.threadid
+                            exclude_tags=self.exclude_tags)]
+        for t in thread_ids:
+             yield t

     def add_message(self, path, tags=None, afterwards=None):
         """
pazz commented 3 years ago

thank you @jonassmedegaard ! I forgot this thread and actually wanted to include the fix in the new release.. Anyhow, mind sending this as PR? that way you also get some credit :)

jonassmedegaard commented 3 years ago

sorry, but I find Github unethical and only ever use their issue trackers.

If more helpful for you that I provide a git fork with the patch applied then I'd be happy to do so - but elsewhere, not on Github.

If you will only credit contributions provided as a Github PR then that's your choice - otherwise I am Jonas Smedegaard dr@jones.dk - that and more also provided at https://dr.jones.dk/info.

pazz commented 3 years ago

OK no worries. I "credit" people by posting the output of git shortlog in the announcement mails, so there'd have to be a commit by you. But it's no big issue if you prefer not to post commits here. Thanks anyway, P

Quoting Jonas Smedegaard (2021-09-28 11:03:35)

sorry, but I find Github unethical and only ever use their issue trackers.

If more helpful for you that I provide a git fork with the patch applied then I'd be happy to do so - but elsewhere, not on Github.

If you will only credit contributions provided as a Github PR then that's your choice - otherwise I am Jonas Smedegaard @.*** - that and more also provided at https://dr.jones.dk/info.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. *

stlawrance commented 3 years ago

Sorry I didn't get around to adapting/testing the patch for current master, I'm glad it got there in the end though! Thanks.

jonassmedegaard commented 2 years ago

@pazz Thanks for applying the patch (and even mentioning me in commit message - often that's not the case when I don't serve commits on a silver plate).

It seems, however, that the patch does not work as intended (I am really sorry - I did test and it did improve things, but later not - weird!).

It seems that the later commit a3b601813 makes my alot no longer crash, however.

So long story short: please simply revert commit caf87d9 and (unless others disagree) this issue can be closed - where credit goes to Kieran Bingham

kbingham commented 2 years ago

I'm not sure I can see how my commit has an effect on this specific issue, except that it may have also been an issue that you would hit if you use "thread_subject = oldest" in your config, which would have prevented you from moving to the latest master otherwise.

That now uses the new notmuch2 bindings, which I suspect might have a lot more changes that could be relevant.

jonassmedegaard commented 2 years ago

please disregard my previous post: I now again experienced crash after 3 times tagging a list more than a screenful in size, and re-applying patch caf87d9 made alot work again.

Perhaps the scenario triggered the crash is not simply "list of more than a screenful" but involves other factors as well.

elenril commented 2 years ago

As I said above in this thread - the problem is in libnotmuch, it cannot be fixed in alot. I intend to send a patch there to handle this.

Also, in my experience it happens a lot less frequently if you upgrade your notmuch database to glass.

pazz commented 2 years ago

Quoting elenril (2021-11-22 07:22:48)

Also, in my experience it happens a lot less frequently if you upgrade your notmuch database to glass.

I am not aware of this. How do I update notmuchs db? just by running notmuch compact?

elenril commented 2 years ago

Quoting Patrick Totzke (2021-11-23 10:33:04)

I am not aware of this. How do I update notmuchs db? just by running notmuch compact?

See https://xapian.org/docs/admin_notes.html#converting-a-chert-database-to-a-glass-database

But I hear dump-recreate-restore may be faster.

-- Anton Khirnov

pazz commented 2 years ago

Quoting elenril (2021-11-23 09:42:51)

Quoting Patrick Totzke (2021-11-23 10:33:04)

I am not aware of this. How do I update notmuchs db? just by running notmuch compact?

See https://xapian.org/docs/admin_notes.html# converting-a-chert-database-to-a-glass-database

Thanks. Debians xapian-examples actually only comes with the source and I cannot be bothered to find out how to compile this.

But I hear dump-recreate-restore may be faster.

That is notmuch dump?

elenril commented 2 years ago

Quoting Patrick Totzke (2021-11-23 11:13:30)

Quoting elenril (2021-11-23 09:42:51)

Quoting Patrick Totzke (2021-11-23 10:33:04)

I am not aware of this. How do I update notmuchs db? just by running notmuch compact?

See https://xapian.org/docs/admin_notes.html# converting-a-chert-database-to-a-glass-database

Thanks. Debians xapian-examples actually only comes with the source and I cannot be bothered to find out how to compile this.

the binary is shipped in xapian-tools

But I hear dump-recreate-restore may be faster.

That is notmuch dump?

Yes, like https://notmuchmail.org/howto/#index6h2

-- Anton Khirnov

pazz commented 2 years ago

coolcool, many thanks! P

lucc commented 1 month ago

Does anybody still hit this bug or now any updates? We have three different indicators/ideas how to fix this collected in this thread:

  1. update your database to glass (https://github.com/pazz/alot/issues/1460#issuecomment-975199240)
  2. switch the python bindings from notmuch to notmuch2 which alot did in #1547 (and #1673)
  3. preload threads into a list instead of using the generator, proposed by @jonassmedegaard and merge into alot in caf87d9

In #1647 it was suggested to revert caf87d9 as it slows down the buffer creation for search queries with many results. So the question is also: can anyone still reproduce this without caf87d9?

@elenril you proposed to send a patch upstream, how did that go?

elenril commented 1 month ago

Quoting Lucas Hoffmann (2024-09-21 08:42:58)

Does anybody still hit this bug or now any updates? We have three different indicators/ideas how to fix this collected in this thread:

  1. update your database to glass (https://github.com/pazz/alot/issues/1460#issuecomment-975199240)
  2. switch the python bindings from notmuch to notmuch2 which alot did in #1547 (and #1673)

These can only reduce the rate at which this happens, not eliminate it completely.

  1. preload threads into a list instead of using the generator, proposed by @jonassmedegaard and merge into alot in caf87d9

This would fix it, but I'd expect a high performance and responsiveness cost.

@elenril you proposed to send a patch upstream, how did that go?

It's been several years since my attempt so I don't remember the details anymore, but it turned out to be harder than I expected. I do intend to return to this issue eventually though, but can't say when.

-- Anton Khirnov

lfos commented 3 days ago

@elenril - Out of curiosity, do you have anything you can share that helps us understand what the issue is, and what'd be required to fix it (e.g., a mailing list thread)?