zellij-org / zellij

A terminal workspace with batteries included
https://zellij.dev
MIT License
22.02k stars 673 forks source link

Crash when resuming or creating a new session #3657

Open kahilah opened 1 month ago

kahilah commented 1 month ago
  1. Issues with the Zellij UI / behavior / crash

Basic information

OS: centos 8 terminal: gnome-terminal zellij --version: 0.40.1

Issue description

I have been using zellij for a week now (with default settings, no plugins etc.) and started experiencing these crashes quite quickly. Common for each crash has been that they happened when either 1) resuming a session or 2) creating a new session.

Zellij has been running for several hours whenever this crash happens. No other common factors has been identified.

Crash results in following error message

  × Thread 'wasm' panicked.
  ├─▶ Originating Thread(s)
  │     1. main_thread: SwitchSession
  │     2. ipc_server: NewClient
  │     3. screen_thread: NewTab
  │     4. plugin_thread: NewTab
  │   
  ├─▶ At .cargo/registry/src/index.crates.io-6f17d22bba15001f/async-global-executor-2.3.1/src/init.rs:39:18
  ╰─▶ cannot spawn executor threads: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }
  help: If you are seeing this message, it means that something went wrong.

        -> To get additional information, check the log at: /tmp/zellij-43927/zellij-log/zellij.log
        -> To see a backtrace next time, reproduce the error with: RUST_BACKTRACE=1 zellij [...]
        -> To help us fix this, please open an issue: https://github.com/zellij-org/zellij/issues

And tmp file is filled with equivalent messages to this: ERROR |zellij_server::background| 2024-10-09 21:55:09.611 [async-std/runti] [.cargo/registry/src/index.crates.io-6f17d22bba15001f/zellij-server-0.40.1/src/background_jobs.rs:443]: Failed to read created stamp of resurrection file: Error { kind: Unsupported, message: "creation time is not available for the filesystem" }

Minimal reproduction

Haven't been able to reproduce in deterministic manner.

Other relevant information

Error message has been similar whether the crash happens when opening an old session or creating a new one.

Restarting zellij results in the same error message and I need to kill zellij processes to enable restart.

kahilah commented 1 month ago

I'd like to add a comment that during the past 2 weeks I have been able to mitigate crashes by keeping the number of sessions minimal 2-5 and actively deleting sessions that I haven't touched for days.

imsnif commented 4 weeks ago

Hey @kahilah - I looked a little bit into this and can't see an immediate cause from these details. Seems like for some reason the async executor can't spawn more threads.

Combined with the logs you provided regarding reading the creation time, a wild guess on my part is that this involves a problem with the generic musl binary. Did you install Zellij in this way (eg. with the Try Zellij before Installing method)? No harm in it, of course - it should work as expected.

If so, would you be willing to try compiling it for your own system (eg. with cargo install --locked zellij)? It might help identify the issue.

tgulacsi commented 3 weeks ago

What's your "uname -a"? May be something limits the number of threads/file descriptors?

kahilah commented 3 weeks ago

Hi, thanks for the suggestions. So my installation has been always via compilation with cargo so that shouldn't be the problem. Kernel info on this particular machine via uname shows: Linux XXXX 4.18.0-348.23.1.el8_5.x86_64 #1 SMP Wed Apr 27 15:32:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

tgulacsi commented 3 weeks ago

Sorry, I've meant "ulimit -a" ...

kahilah commented 3 weeks ago

Ah I see. The ulimit shows the following:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 8204480
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 100000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
kahilah commented 3 days ago

As an additional information to this:

1) I have been using single session continuously over 3 weeks without crashes. (i.e. single session is very robust). 2) I closed this long session and noticed that it's latest state was not saved to attachable sessions. 3) I had few other very old (3 weeks or so) sessions there so tried to switch into them: few worked, but one caused zellij to crash with equivalent error message to one presented in the first message above.

After this crash (3) I cannot open zellij as it continues giving the same error. I need to kill existing zellij processes and then it works.

kahilah commented 2 days ago

Another note, I just realised that if I try to remove everything from .cache/zellij, the 0.40.1/session_info directory remains. Actually, removing individual old session files there are regenerated immediately after removal. I have killed all zellij processes manually but this still happens. Sounds to me that some rogue process doing this?

kahilah commented 2 days ago

Another note, I just realised that if I try to remove everything from .cache/zellij, the 0.40.1/session_info directory remains. Actually, removing individual old session files there are regenerated immediately after removal. I have killed all zellij processes manually but this still happens. Sounds to me that some rogue process doing this?

Note to myself, after system logout-login refresh, ps -fC found few more zellij processes which required killing. After that, removing cache was succesfull.

I'll update to latest release version and keep experimenting.

imsnif commented 2 days ago

Hey @kahilah - it's a little hard for me to keep track at this stage. Any chance for a summary of your findings?

kahilah commented 23 hours ago

Sorry for the convoluted issue but as a summary:

With version 0.40.1 I experienced crashes with the error message shown in the first message when switching between sessions or when resuming a sessions which have been created several hours / days ago. Crash happens at the moment switching happens.

I upgraded to latest version two days ago and no crashes yet, but my session usage has been limited.