mawww / kakoune

mawww's experiment for a better code editor
http://kakoune.org
The Unlicense
9.97k stars 714 forks source link

[BUG][CRASH] #5006

Open vbauerster opened 1 year ago

vbauerster commented 1 year ago

Version of Kakoune

Kakoune v2023.08.05-53-gd1c8622d

Reproducer

I use tmux with kakoune. Crush happens when I invoke new (new client) command. Looks like kakoune is waiting for some command to finish and then crushes when all ram and swap exhausted. This is not happening with release Kakoune v2023.08.05. Following is screenshot of htop filtered with kak when that happens. Screenshot from 2023-10-25 10-40-06

Outcome

❯ coredumpctl info
           PID: 1111907 (kak)
           UID: 1000 (vbauer)
           GID: 1000 (vbauer)
        Signal: 6 (ABRT)
     Timestamp: Wed 2023-10-25 10:16:04 +05 (21s ago)
  Command Line: kak -c 1110812
    Executable: /usr/local/bin/kak
 Control Group: /user.slice/user-1000.slice/user@1000.service/app.slice/foot-server.service
          Unit: user@1000.service
     User Unit: foot-server.service
         Slice: user-1000.slice
     Owner UID: 1000 (vbauer)
       Boot ID: f9ee5b9ddfd14cc5a5ae2570c0209608
    Machine ID: _
      Hostname: archmbxpro
       Storage: /var/lib/systemd/coredump/core.kak.1000.f9ee5b9ddfd14cc5a5ae2570c0209608.1111907.1698210964000000.zst (present)
  Size on Disk: 68.9K
       Message: Process 1111907 (kak) of user 1000 dumped core.

                Stack trace of thread 1111907:
                #0  0x00007ff4efbbf83c n/a (libc.so.6 + 0x8e83c)
                #1  0x00007ff4efb6f668 raise (libc.so.6 + 0x3e668)
                #2  0x00007ff4efb574b8 abort (libc.so.6 + 0x264b8)
                #3  0x00005630b9a3464a _ZN7Kakoune14signal_handlerEi (kak + 0xc264a)
                #4  0x00007ff4efb6f710 n/a (libc.so.6 + 0x3e710)
                #5  0x00007ff4efc3e4f0 pselect (libc.so.6 + 0x10d4f0)
                #6  0x00005630b9adb10f _ZN7Kakoune12EventManager18handle_next_eventsENS_9EventModeEP10__sigset_tb (kak + 0x16910f)
                #7  0x00005630b9b76a35 _ZN7Kakoune10run_clientENS_10StringViewES0_S0_NS_8OptionalINS_11BufferCoordEEENS_6UITypeEb (kak + 0x204a35)
                #8  0x00005630b9a4b1a1 main (kak + 0xd91a1)
                #9  0x00007ff4efb58cd0 n/a (libc.so.6 + 0x27cd0)
                #10 0x00007ff4efb58d8a __libc_start_main (libc.so.6 + 0x27d8a)
                #11 0x00005630b9a4b965 _start (kak + 0xd9965)
                ELF object binary architecture: AMD x86-64

Expectations

No response

Additional information

No response

krobelus commented 1 year ago

looks like you spawn an awk process that eats up all memory. I assume no other process uses a signifcant amount of memory. How do you start that awk process? I can't find Convert a position a document to a position in our list of flags anywhere. Can you give a small reproducer?

vbauerster commented 1 year ago

I'm not sure how to reproduce exactly. Maybe it is specific to my config, but I have disabled major plugins like kak-lsp before test. Also this is not happening with release with same config.

krobelus commented 1 year ago

To fix this we need to find out who is using all the 22GB of memory. The awk command and its stdin might be enough.

So you're saying the OOM does not happen on the release? Or only the coredump?

vbauerster commented 1 year ago

So you're saying the OOM does not happen on the release? Or only the coredump?

Exactly, with release version of kakoune I don't have both.

vbauerster commented 1 year ago

Ok, I found culprit plugin.

krobelus commented 1 year ago

does it reproduce consistently? If yes, it shouldn't take long to run a git bisect. There are only 28 commits in git log v2023.08.05..origin/master src On Linux, you could limit the amount of memory using cgroups to prevent a general system freeze. Also disable swap for reproducing this.

vbauerster commented 1 year ago

OOM and crash not happening with disabled scrollbar.kak plugin. But surprisingly it works in release version.

krobelus commented 1 year ago

ok the easiest approach might be to add print statements to every loop in the awk process. It's only 4 loops.

I suspect that in

for (i=flags_start; i<=flags_end; i++) {
    flags_by_line[i] = 1
}

flags_start > flags_end so we loop for billions of times. This could be the case if %val{window_range} is reported differently for new clients. Maybe 978775661 (Use last display setup instead of recomputing for window_range, 2023-09-08).

vbauerster commented 1 year ago

I invoke scrollbar-enable at WinCreate hook, like:

hook global WinCreate .* %{
    scrollbar-enable
}

With this hook disabled, invoking at prompt new; scrollbar-enable works as expected, i.e. not causing OOM and crash.

vbauerster commented 1 year ago

ok the easiest approach might be to add print statements to every loop in the awk process. It's only 4 loops.

I suspect that in

for (i=flags_start; i<=flags_end; i++) {
    flags_by_line[i] = 1
}

flags_start > flags_end so we loop for billions of times. This could be the case if %val{window_range} is reported differently for new clients. Maybe 9787756 (Use last display setup instead of recomputing for window_range, 2023-09-08).

I managed to print flats_start and flags_end values right before the loop, so flags_end ends up having enormous big values:

flags_start: 0
flags_end: 360284
flags_start: 0
flags_end: 348847
flags_start: 0
flags_end: 338113
krobelus commented 1 year ago

I can reproduce with

timeout -sKILL 5 src/kak -e 'source ../scrollbar.kak/scrollbar.kak; hook g WinCreate .* scrollbar-enable; new; new; new; new'; reset

the clients that don't manage to show the scratch buffer triggered the bug. Bisects to 978775661 (Use last display setup instead of recomputing for window_range, 2023-09-08). Haven't root-caused it yet. window_range appears to be zero even in the affected clients. Maybe it's some kind of UB causing kakoune to run into a loop or something

Screwtapello commented 1 year ago

Yeah, I reported this as https://github.com/mawww/kakoune/issues/4975 and then tried to update the documentation in https://github.com/mawww/kakoune/pull/4994. Unfortunately it seems I still don't quite understand how things should work, so I haven't been able to fix the docs properly or fix the scrollbar plugin.

krobelus commented 1 year ago

right I forgot to run with sanitizers