kovidgoyal / kitty

Cross-platform, fast, feature-rich, GPU based terminal
https://sw.kovidgoyal.net/kitty/
GNU General Public License v3.0
24.59k stars 985 forks source link

Process (kitty) dumped core (SIGSEGV) with custom session on Wayland with binary NVIDIA driver #7838

Closed Strykar closed 2 months ago

Strykar commented 2 months ago

Describe the bug Kitty crashes intermittently with a segmentation fault (SIGSEGV) when starting with specific session configurations on Wayland using NVIDIA drivers. It will run for hours and then crash, often when I am alt-tabbing to or away from kitty.

To Reproduce Steps to reproduce the behavior:

  1. None - Unable to reproduce at will

Stack trace and GDB output from session:

$ coredumpctl gdb 6099
           PID: 6099 (kitty)
           UID: 1000 (strykar)
           GID: 1000 (strykar)
        Signal: 11 (SEGV)
     Timestamp: Mon 2024-09-09 08:38:43 IST (1h 39min ago)
  Command Line: /usr/bin/kitty --start-as fullscreen --instance-group=1 --config=/home/strykar/.config/kitty/kitty.conf --session=/home/strykar/.config/kitty/weechat.session --directory=/home/strykar
    Executable: /usr/bin/kitty
 Control Group: /user.slice/user-1000.slice/user@1000.service/app.slice/app-gnome-weechat-6099.scope
          Unit: user@1000.service
     User Unit: app-gnome-weechat-6099.scope
         Slice: user-1000.slice
     Owner UID: 1000 (strykar)
       Boot ID: 368b8e2a802b4bbe8f12777c98325a11
    Machine ID: e38784efc0054586884081173d63edb2
      Hostname: r912
       Storage: /var/lib/systemd/coredump/core.kitty.1000.368b8e2a802b4bbe8f12777c98325a11.6099.1725851323000000.zst (present)
  Size on Disk: 7.9M
       Message: Process 6099 (kitty) of user 1000 dumped core.

                Stack trace of thread 6099:
                #0  0x00007c806d7b4423 n/a (libnvidia-egl-wayland.so.1 + 0x5423)
                #1  0x00007c806d7b9ba4 n/a (libnvidia-egl-wayland.so.1 + 0xaba4)
                #2  0x00007c806d2ab62e n/a (libEGL_nvidia.so.0 + 0xab62e)
                #3  0x00007c806d24e640 n/a (libEGL_nvidia.so.0 + 0x4e640)
                #4  0x00007c806e32896f glfwSwapBuffers (glfw-wayland.so + 0xe96f)
                #5  0x00007c806f21387d n/a (fast_data_types.so + 0x1387d)
                #6  0x00007c806e324553 glfwRunMainLoop (glfw-wayland.so + 0xa553)
                #7  0x00007c806f20b9e4 n/a (fast_data_types.so + 0xb9e4)
                #8  0x00007c80703bb86b n/a (libpython3.12.so.1.0 + 0x1bb86b)
                #9  0x00007c80703a500d PyObject_Vectorcall (libpython3.12.so.1.0 + 0x1a500d)
                #10 0x00007c8070389d71 _PyEval_EvalFrameDefault (libpython3.12.so.1.0 + 0x189d71)
                #11 0x00007c8070383f86 _PyObject_FastCallDictTstate (libpython3.12.so.1.0 + 0x183f86)
                #12 0x00007c80703c11b2 _PyObject_Call_Prepend (libpython3.12.so.1.0 + 0x1c11b2)
                #13 0x00007c8070498776 n/a (libpython3.12.so.1.0 + 0x298776)
                #14 0x00007c80703811ab _PyObject_MakeTpCall (libpython3.12.so.1.0 + 0x1811ab)
                #15 0x00007c8070389d71 _PyEval_EvalFrameDefault (libpython3.12.so.1.0 + 0x189d71)
                #16 0x00007c807044e395 PyEval_EvalCode (libpython3.12.so.1.0 + 0x24e395)
                #17 0x00007c807046a836 n/a (libpython3.12.so.1.0 + 0x26a836)
                #18 0x00007c80703a515e n/a (libpython3.12.so.1.0 + 0x1a515e)
                #19 0x00007c80703a500d PyObject_Vectorcall (libpython3.12.so.1.0 + 0x1a500d)
                #20 0x00007c8070389d71 _PyEval_EvalFrameDefault (libpython3.12.so.1.0 + 0x189d71)
                #21 0x00007c807047f7ef n/a (libpython3.12.so.1.0 + 0x27f7ef)
                #22 0x00007c80702c4762 n/a (libpython3.12.so.1.0 + 0xc4762)
                #23 0x000056807faaa88f main (kitty + 0x288f)
                #24 0x00007c8070034e08 n/a (libc.so.6 + 0x25e08)
                #25 0x00007c8070034ecc __libc_start_main (libc.so.6 + 0x25ecc)
                #26 0x000056807faaad15 _start (kitty + 0x2d15)

                Stack trace of thread 6968:
                #0  0x00007c807011ac5a read (libc.so.6 + 0x10bc5a)
                #1  0x00007c806f21f5d5 n/a (fast_data_types.so + 0x1f5d5)
                #2  0x00007c80700a339d n/a (libc.so.6 + 0x9439d)
                #3  0x00007c807012849c n/a (libc.so.6 + 0x11949c)

                Stack trace of thread 6969:
                #0  0x00007c807011a63d __poll (libc.so.6 + 0x10b63d)
                #1  0x00007c805a8ce9b7 n/a (libpulse.so.0 + 0x339b7)
                #2  0x00007c805a8b845c pa_mainloop_poll (libpulse.so.0 + 0x1d45c)
                #3  0x00007c805a8c261c pa_mainloop_iterate (libpulse.so.0 + 0x2761c)
                #4  0x00007c805a8c26d1 pa_mainloop_run (libpulse.so.0 + 0x276d1)
                #5  0x00007c805a8d2bf2 n/a (libpulse.so.0 + 0x37bf2)
                #6  0x00007c805a8702b7 n/a (libpulsecommon-17.0.so + 0x5c2b7)
                #7  0x00007c80700a339d n/a (libc.so.6 + 0x9439d)
                #8  0x00007c807012849c n/a (libc.so.6 + 0x11949c)

                Stack trace of thread 1238718:
                #0  0x00007c807009fa19 n/a (libc.so.6 + 0x90a19)
                #1  0x00007c80700a2479 pthread_cond_wait (libc.so.6 + 0x93479)
                #2  0x00007c806d2bbf38 n/a (libEGL_nvidia.so.0 + 0xbbf38)
                #3  0x00007c806d28aef1 n/a (libEGL_nvidia.so.0 + 0x8aef1)
                #4  0x00007c806d2c1fce n/a (libEGL_nvidia.so.0 + 0xc1fce)
                #5  0x00007c80700a339d n/a (libc.so.6 + 0x9439d)
                #6  0x00007c807012849c n/a (libc.so.6 + 0x11949c)

                Stack trace of thread 6569:
                #0  0x00007c807011a63d __poll (libc.so.6 + 0x10b63d)
                #1  0x00007c806f20d7be n/a (fast_data_types.so + 0xd7be)
                #2  0x00007c80700a339d n/a (libc.so.6 + 0x9439d)
                #3  0x00007c807012849c n/a (libc.so.6 + 0x11949c)

                Stack trace of thread 6570:
                #0  0x00007c807011a63d __poll (libc.so.6 + 0x10b63d)
                #1  0x00007c806f20cc85 n/a (fast_data_types.so + 0xcc85)
                #2  0x00007c80700a339d n/a (libc.so.6 + 0x9439d)
                #3  0x00007c807012849c n/a (libc.so.6 + 0x11949c)
                ELF object binary architecture: AMD x86-64

GNU gdb (GDB) 15.1
Reading symbols from /usr/bin/kitty...
Downloading separate debug info for /usr/bin/kitty
Reading symbols from /home/strykar/.cache/debuginfod_client/4a81b41e3903b790d88c21367c16127acf32921b/debuginfo...

warning: Can't open file /memfd:/.glXXXXXX (deleted) during file-backed mapping note processing

warning: Can't open file /memfd:pulseaudio (deleted) during file-backed mapping note processing

warning: Can't open file /memfd:glfw-shared (deleted) during file-backed mapping note processing

warning: Can't open file /memfd:mutter-shared (deleted) during file-backed mapping note processing

warning: Can't open file /memfd:wayland-cursor (deleted) during file-backed mapping note processing
[New LWP 6099]
[New LWP 6968]
[New LWP 6969]
[New LWP 1238718]
[New LWP 6569]
[New LWP 6570]
Downloading separate debug info for /usr/lib/libEGL_nvidia.so.0
Downloading separate debug info for /usr/lib/libnvidia-glsi.so.560.35.03
Downloading separate debug info for /usr/lib/libnvidia-eglcore.so.560.35.03
Downloading separate debug info for /usr/lib/libnvidia-gpucomp.so.560.35.03
Downloading separate debug info for /usr/lib/libnvidia-egl-gbm.so.1
Downloading separate debug info for /usr/lib/libgbm.so.1
Downloading separate debug info for /usr/lib/libgallium-24.2.2-arch1.1.so
Downloading separate debug info for /usr/lib/libglapi.so.0
Downloading separate debug info for /usr/lib/libnvidia-egl-xcb.so.1
Downloading separate debug info for /usr/lib/libnvidia-egl-xlib.so.1
Downloading separate debug info for /usr/lib/libEGL_mesa.so.0
Downloading separate debug info for /usr/lib/libnvidia-allocator.so.1
Downloading separate debug info for /usr/lib/libvorbisfile.so.3
Downloading separate debug info for /usr/lib/libvorbis.so.0
Downloading separate debug info for /usr/lib/libvorbisenc.so.2
Downloading separate debug info for system-supplied DSO at 0x7c8070907000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Core was generated by `/usr/bin/kitty --start-as fullscreen --instance-group=1 --config=/home/strykar/'.
Program terminated with signal SIGSEGV, Segmentation fault.
--Type <RET> for more, q to quit, c to continue without paging--c
#0  0x00007c806d7b4423 in send_explicit_sync_points (display=0x56809e560100, surface=0x56809e60c920, image=0x0) at ../egl-wayland/src/wayland-eglsurface.c:205
205     syncFd = data->egl.dupNativeFenceFD(dpy, image->acquireSync);
[Current thread is 1 (Thread 0x7c80708bdb80 (LWP 6099))]

(gdb) bt
#0  0x00007c806d7b4423 in send_explicit_sync_points (display=0x56809e560100, surface=0x56809e60c920, image=0x0) at ../egl-wayland/src/wayland-eglsurface.c:205
#1  wlEglSendDamageEvent (surface=surface@entry=0x56809e60c920, queue=0x56809e470730) at ../egl-wayland/src/wayland-eglsurface.c:279
#2  0x00007c806d7b9ba4 in wlEglSwapBuffersWithDamageHook (eglDisplay=<optimized out>, eglSurface=<optimized out>, rects=<optimized out>, n_rects=<optimized out>) at ../egl-wayland/src/wayland-eglswap.c:150
#3  0x00007c806d2ab62e in ?? () from /usr/lib/libEGL_nvidia.so.0
#4  0x00007c806d24e640 in ?? () from /usr/lib/libEGL_nvidia.so.0
#5  0x00007c806e32896f in glfwSwapBuffers (handle=0x56809e3f6c10) at glfw/context.c:479
#6  0x00007c806f21387d in swap_window_buffers (os_window=<optimized out>) at kitty/glfw.c:1821
#7  swap_window_buffers (os_window=0x56809e1690b0) at kitty/glfw.c:1821
#8  render_prepared_os_window (os_window=0x56809e1690b0, active_window_id=18, active_window_bg=<optimized out>, num_visible_windows=<optimized out>, all_windows_have_same_bg=<optimized out>) at kitty/child-monitor.c:812
#9  render_os_window (ignore_render_frames=false, w=<optimized out>, now=<optimized out>, scan_for_animated_images=<optimized out>) at kitty/child-monitor.c:867
#10 render (now=<optimized out>, input_read=<optimized out>) at kitty/child-monitor.c:891
#11 process_global_state (data=data@entry=0x7c806e4a57f0) at kitty/child-monitor.c:1272
#12 0x00007c806e324553 in _glfwPlatformRunMainLoop (tick_callback=0x7c806f212320 <process_global_state>, data=0x7c806e4a57f0) at glfw/main_loop.h:34
#13 glfwRunMainLoop (callback=0x7c806f212320 <process_global_state>, data=0x7c806e4a57f0) at glfw/init.c:360
#14 0x00007c806f20b9e4 in run_main_loop (cb=0x7c806f212320 <process_global_state>, cb_data=0x7c806e4a57f0) at kitty/glfw.c:2152
#15 main_loop (self=0x7c806e4a57f0, a=<optimized out>) at kitty/child-monitor.c:1297
#16 0x00007c80703bb86b in method_vectorcall_NOARGS (func=0x7c806f99b880, args=0x7c80708985a0, nargsf=<optimized out>, kwnames=0x0) at Objects/descrobject.c:454
#17 0x00007c80703a500d in _PyObject_VectorcallTstate (tstate=0x7c807082b298 <_PyRuntime+459704>, callable=0x7c806f99b880, args=0x7c80708985a0, nargsf=9223372036854775809, kwnames=0x0) at ./Include/internal/pycore_call.h:92
#18 PyObject_Vectorcall (callable=0x7c806f99b880, args=0x7c80708985a0, nargsf=9223372036854775809, kwnames=0x0) at Objects/call.c:325
#19 0x00007c8070389d71 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at Python/bytecodes.c:2714
#20 0x00007c8070383f86 in _PyEval_EvalFrame (tstate=0x7c807082b298 <_PyRuntime+459704>, frame=0x7c8070898438, throwflag=0) at ./Include/internal/pycore_ceval.h:89
#21 _PyEval_Vector (tstate=0x7c807082b298 <_PyRuntime+459704>, func=0x7c806e580360, locals=0x0, args=0x7ffcb690e320, argcount=<optimized out>, kwnames=0x0) at Python/ceval.c:1683
#22 _PyFunction_Vectorcall (func=0x7c806e580360, stack=0x7ffcb690e320, nargsf=<optimized out>, kwnames=0x0) at Objects/call.c:419
#23 _PyObject_FastCallDictTstate (tstate=<optimized out>, callable=0x7c806e580360, args=0x7ffcb690e320, nargsf=<optimized out>, kwargs=<optimized out>) at Objects/call.c:133
#24 0x00007c80703c11b2 in _PyObject_Call_Prepend (tstate=0x7c807082b298 <_PyRuntime+459704>, callable=0x7c806e580360, obj=<optimized out>, args=0x7c806e3a28e0, kwargs=0x0) at Objects/call.c:508
#25 0x00007c8070498776 in slot_tp_call (self=0x7c806e8b6090, args=0x7c806e3a28e0, kwds=0x0) at Objects/typeobject.c:8779
#26 0x00007c80703811ab in _PyObject_MakeTpCall (tstate=0x7c807082b298 <_PyRuntime+459704>, callable=0x7c806e8b6090, args=<optimized out>, nargs=<optimized out>, keywords=<optimized out>) at Objects/call.c:240
#27 0x00007c8070389d71 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at Python/bytecodes.c:2714
#28 0x00007c807044e395 in PyEval_EvalCode (co=0x7c806fb44f30, globals=<optimized out>, locals=0x7c806fbf60c0) at Python/ceval.c:578
#29 0x00007c807046a836 in builtin_exec_impl (module=<optimized out>, source=0x7c806fb44f30, globals=0x7c806fbf60c0, locals=0x7c806fbf60c0, closure=<optimized out>) at Python/bltinmodule.c:1096
#30 builtin_exec (module=<optimized out>, args=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at Python/clinic/bltinmodule.c.h:586
#31 0x00007c80703a515e in cfunction_vectorcall_FASTCALL_KEYWORDS (func=<optimized out>, args=0x7c8070898180, nargsf=<optimized out>, kwnames=0x0) at Objects/methodobject.c:438
#32 0x00007c80703a500d in _PyObject_VectorcallTstate (tstate=0x7c807082b298 <_PyRuntime+459704>, callable=0x7c806fb9c770, args=0x7c8070898180, nargsf=9223372036854775810, kwnames=0x0) at ./Include/internal/pycore_call.h:92
#33 PyObject_Vectorcall (callable=0x7c806fb9c770, args=0x7c8070898180, nargsf=9223372036854775810, kwnames=0x0) at Objects/call.c:325
#34 0x00007c8070389d71 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at Python/bytecodes.c:2714
#35 0x00007c807047f7ef in pymain_run_module (modname=modname@entry=0x7c80705270f8 L"__main__", set_argv0=set_argv0@entry=0) at Modules/main.c:300
#36 0x00007c80702c4762 in pymain_run_python (exitcode=0x7ffcb690e9f4) at Modules/main.c:630
#37 Py_RunMain () at Modules/main.c:713
#38 0x000056807faaa88f in run_embedded (run_data=<synthetic pointer>) at kitty/launcher/main.c:216
#39 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at kitty/launcher/main.c:464

(gdb) info registers
rax            0x56809e576470      95110412330096
rbx            0x56809e60c9f8      95110412945912
rcx            0x7c806d2b7a70      136891029224048
rdx            0x56809e3f9270      95110410769008
rsi            0x0                 0
rdi            0x56809e5620b0      95110412247216
rbp            0x7ffcb690dca0      0x7ffcb690dca0
rsp            0x7ffcb690db90      0x7ffcb690db90
r8             0x7ffcb690d9e8      140723371432424
r9             0x7ffcb690da48      140723371432520
r10            0x0                 0
r11            0x56809e418570      95110410896752
r12            0x56809e560100      95110412239104
r13            0x0                 0
r14            0x56809e26a070      95110409134192
r15            0x56809e60c920      95110412945696
rip            0x7c806d7b4423      0x7c806d7b4423 <wlEglSendDamageEvent+1779>
eflags         0x10202             [ IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
k0             0x2                 2
k1             0xffffffff          4294967295
k2             0xefbffbff          4022336511
k3             0x0                 0
k4             0xffff7fff          4294934527
k5             0x0                 0
k6             0x0                 0
k7             0x0                 0
fs_base        0x7c80708bdb80      136891085872000
gs_base        0x0                 0

(gdb) info locals
data = 0x56809e3f9270
dpy = 0x56809e5613f0
err = <optimized out>
syncFd = <optimized out>
acquireSyncPoint = <optimized out>
data = <optimized out>
dpy = <optimized out>
syncFd = <optimized out>
err = <optimized out>
acquireSyncPoint = <optimized out>

(gdb)

Environment details

kitty 0.36.2 created by Kovid Goyal
Linux r912 6.10.8-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 04 Sep 2024 15:16:37 +0000 x86_64
Arch Linux 6.10.8-arch1-1 (/dev/tty)

DISTRIB_ID="Arch"
DISTRIB_RELEASE="rolling"
DISTRIB_DESCRIPTION="Arch Linux"
Running under: Wayland (GNOME Shell 46.4) missing: layer_shell
OpenGL: '3.1.0 NVIDIA 560.35.03' Detected version: 3.1
Frozen: False
Fonts:
  medium: CascadiaMonoRoman-Regular: /usr/share/fonts/TTF/CascadiaMono.ttf:262144
          Features: ()
    bold: CascadiaMonoRoman-SemiBold: /usr/share/fonts/TTF/CascadiaMono.ttf:327680
          Features: ()
  italic: CascadiaMono-Italic: /usr/share/fonts/TTF/CascadiaMonoItalic.ttf:262144
          Features: ()
      bi: CascadiaMono-SemiBoldItalic: /usr/share/fonts/TTF/CascadiaMonoItalic.ttf:327680
          Features: ()
Paths:
  kitty: /usr/bin/kitty
  base dir: /usr/lib/kitty
  extensions dir: /usr/lib/kitty/kitty
  system shell: /usr/bin/bash
Loaded config files:
  /home/strykar/.config/kitty/kitty.conf

Config options different from defaults:
allow_remote_control    socket-only
background_opacity      0.5
confirm_os_window_close 0
copy_on_select          clipboard
disable_ligatures       1
env:
{'PATH': '/usr/local/bin:/usr/bin:/usr/local/sbin:/opt/cuda/bin:/opt/cuda/nsight_compute:/opt/cuda/nsight_systems/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/usr/local/conky/bin:/home/strykar/.bin:/home/strykar>
focus_follows_mouse     True
font_family             CascadiaMono
font_features:
{'+calt': ('ss01', 'ss02', 'ss03', 'ss19', 'ss20')}
font_size               12.5
linux_display_server    wayland
listen_on               unix:@mykitty
scrollback_lines        10000
shell                   /bin/bash --login
tab_title_template      {index}: {fmt.fg.red}{bell_symbol}{activity_symbol}{fmt.fg.tab}{title}
wheel_scroll_multiplier 8.0
window_alert_on_bell    False
Added shortcuts:
        ctrl+down →  neighboring_window down
        ctrl+f1 →  kitten hints --customize-processing weechat_hints.py
        ctrl+f5 →  load_config_file
        ctrl+left →  neighboring_window left
        ctrl+right →  neighboring_window right
        ctrl+up →  neighboring_window up
        kitty_mod+f9 →  clear_terminal reset active
        shift+down →  move_window down
        shift+left →  move_window left
        shift+right →  move_window right
        shift+up →  move_window up
Changed shortcuts:
        kitty_mod+f10 →  clear_terminal clear active
        kitty_mod+f11 →  clear_terminal scrollback active
        kitty_mod+o →  pass_selection_to_program ~/.config/kitty/selectiontobrowser.sh

Important environment variables seen by the kitty process:
        PATH                                /usr/local/bin:/usr/bin:/usr/local/sbin:/opt/cuda/bin:/opt/cuda/nsight_compute:/opt/cuda/nsight_systems/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/usr/local/conky/bi>
        LANG                                en_US.UTF-8
        SHELL                               /usr/bin/bash
        DISPLAY                             :0
        WAYLAND_DISPLAY                     wayland-0
        USER                                strykar
        XDG_MENU_PREFIX                     gnome-
        XDG_SESSION_DESKTOP                 gnome
        XDG_SESSION_TYPE                    wayland
        XDG_CURRENT_DESKTOP                 GNOME
        XDG_SESSION_CLASS                   user
        XDG_RUNTIME_DIR                     /run/user/1000
        XDG_DATA_DIRS                       /home/strykar/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share/:/usr/share/

Additional context I cannot reproduce the problem at will or with kitty --config NONE which sucks I know. My kitty config and session file - https://gist.github.com/Strykar/2c5f511c842ffe77983e81f94a1ad61f

Of the many things I tried, the only idea I have so far is that it only happens when multiple tabs and windows within tabs are open. Are backtraces enough, I am open to ideas on trying to nail this down?

kovidgoyal commented 2 months ago

It's almost certainly a bug in the nvidia drivers related to explicit sync. The crash is trigerred by eglswapbuffers() and is happening in libeglnvidia. Downgrade your nvidia drivers and you will be fine. Report the issue to the nvidia driver maintainers.