microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.53k stars 823 forks source link

Distro shut down even with running processes #8854

Open bradwilson opened 2 years ago

bradwilson commented 2 years ago

Version

Microsoft Windows [Version 10.0.22621.521]

WSL Version

Kernel Version

5.15.62.1

Distro Version

Ubuntu 20.04

Other Software

Docker Engine version 20.10.18

Repro Steps

  1. Create fresh Ubuntu 20.04 image
  2. Enable systemd
  3. Install Docker into Ubuntu (Ansible example)
  4. Run nginx in Docker (docker run -d -p 80:80 nginx)
  5. Verify nginx is running in your browser (http://localhost/)
  6. Exit the shell

Expected Behavior

Since nginx is running, the distro should not shut down.

Prior to upgrading to 0.67.6 w/ systemd support, this was the (presumably correct, but definitely desired) behavior.

Actual Behavior

After a short period of time, the distro is forcefully shut down by the system.

Logging back in, you can see it was forcefully shut down:

 sh$ docker ps -a
CONTAINER ID   IMAGE     COMMAND                  CREATED          STATUS                       PORTS                               NAMES
ad20314283c5   nginx     "/docker-entrypoint.…"   14 minutes ago   Exited (255) 4 minutes ago   0.0.0.0:80->80/tcp, :::80->80/tcp   admiring_bardeen

Diagnostic Logs

No response

aki-k commented 7 months ago

The issue turned out to be having systemd enabled. With systemd enabled, a few GUI applications could not launch

This is false.

Unsubscribing from this as I don't think I'm going to get a real answer until I accidentally discover it as a feature pushed upstream.

This is temper tantrum.

codeart1st commented 6 months ago

@firejox thanks! Do I put it in /etc/wsl.conf ?

It is systemd service, so you need to save as [service name].service file and put the file under systemd folder e.g. /etc/systemd/system/. And you can run these commands.

  • Start systemd service
sudo systemctl start [service name].service
  • Make it can auto run in next booting.
systemctl enable [service name].service

Autostart (enable) didn't work for me.

keep-distro-alive.service: Failed to execute /mnt/c/Windows/system32/waitfor.exe: Exec format error
keep-distro-alive.service: Failed at step EXEC spawning /mnt/c/Windows/system32/waitfor.exe: Exec format error

But the service itself is fine. Seems like the interop layer is not ready during wsl startup.

firejox commented 6 months ago

@codeart1st It looks like keep-distro-alive.service which is loaded before wsl-binfmt.service or before systemd-binfmt.service. You can check startup order with command systemd-analyze

systemd-analyze plot > startup_order.svg

startup_order.svg will show the loaded order of all services in systemd.

If keep-distro-alive.service is loaded before those binfmt services, you can add this line into your service file under [Unit] section.

After=wsl-binfmt.service systemd-binfmt.service

This will make keep-distro-alive.service executed after the service which setup binfmt.

resources: https://stackoverflow.com/questions/29309717/is-there-any-way-to-list-systemd-services-in-linux-in-the-order-of-they-were-l https://stackoverflow.com/questions/21830670/start-systemd-service-after-specific-service

codeart1st commented 6 months ago

@codeart1st It looks like keep-distro-alive.service which is loaded before wsl-binfmt.service or before systemd-binfmt.service. You can check startup order with command systemd-analyze

systemd-analyze plot > startup_order.svg

startup_order.svg will show the loaded order of all services in systemd.

If keep-distro-alive.service is loaded before those binfmt services, you can add this line into your service file under [Unit] section.

After=wsl-binfmt.service systemd-binfmt.service

This will make keep-distro-alive.service executed after the service which setup binfmt.

resources: https://stackoverflow.com/questions/29309717/is-there-any-way-to-list-systemd-services-in-linux-in-the-order-of-they-were-l https://stackoverflow.com/questions/21830670/start-systemd-service-after-specific-service

image

Thank you, you're right.

everson-plantae commented 6 months ago

@firejox Your solution works well when there is only one active distribution, but I am working with two, when starting the second one, the process in the first one is interrupted.

firejox commented 6 months ago

@everson-plantae It is because the [signal name] in waitfor.exe [signal name] is globally shared. You need to use different signal names for different distributions. For example, there are Fedora and Ubuntu in WSL. You may use waitfor.exe FedoraAlive for Fedora and waitfor.exe UbuntuAlive for Ubuntu. Or you can use choice.exe with named pipe to keep distro alive. choice.exe is no need to setup special [signal name] for different distribution.

astroboylrx commented 4 months ago

@firejox It seems WSL2 has become more aggressive when trying to shutdown itself.

I added the systemd service suggested above (with the After=wsl-binfmt.service systemd-binfmt.service). However, after a night, WSL2 still shut down. Then I went to check the Task Manager, the waitfor.exe process is still there... somehow it survives but didn't keep WSL2 alive...

Any suggestions would be greatly appreciated.

cerebrate commented 4 months ago

Like I said way back in the thread, the only processes that will keep the WSL session from being auto-terminated are children of the Microsoft init (not the pid 1 init, the WSL-supplied init). No systemd service can be this, so you won't get anywhere with those.

To be clear, here's a pstree:

systemd─┬─Relay(1061)───wait-forever.sh───sleep
        ├─2*[agetty]
        ├─automount───5*[{automount}]
        ├─containerd───12*[{containerd}]
        ├─dbus-daemon
        ├─dockerd───14*[{dockerd}]
        ├─init-systemd(De─┬─SessionLeader───Relay(1936)───machinectl
        │                 ├─SessionLeader───Relay(332257)───sleep
        │                 ├─init───{init}
        │                 ├─sh
        │                 └─{init-systemd(De}
        ├─polkitd───3*[{polkitd}]
        ├─rpc.gssd───{rpc.gssd}
        ├─rpcbind
...etc....

What you are looking for is the Relay(number) parent. A process which has that above it in the tree (here, that would be wait-forever.sh, machinectl, and sleep) will keep the WSL instance running; one that doesn't, won't.

_nohup_ing or _daemonize_ing a waitforever script is still the best option, I believe.

Parsifa1 commented 4 months ago

I used to use dbus-launch true in shell config to solve this problem, but recently after I updated, wsl will automatically close after one night, is there any plan to add "wsl never automatically closes" to wslconfig?

aki-k commented 4 months ago

is there any plan to add "wsl never automatically closes" to wslconfig?

After following this saga since the beginning: "No."

cerebrate commented 4 months ago

is there any plan to add "wsl never automatically closes" to wslconfig?

https://github.com/microsoft/WSL/issues/8854#issuecomment-1255501711 , but since it's got such a simple workaround, I can't imagine it's anywhere near the top of the WSL team's list of possible new features.

astroboylrx commented 4 months ago

_nohup_ing or _daemonize_ing a waitforever script is still the best option, I believe.

@cerebrate I tried _nohup_ing and WSL2 (Ubuntu 22.04) still stopped after a night. Previously dbus-launch true was sufficient to keep distro alive. Feels like the shutdown did become more aggressive since a very recent update (I think maybe I'm seeing what @Parsifa1 sees).

WSL version: 2.3.11.0
Kernel version: 6.6.36.3-1
WSLg version: 1.0.63
MSRDC version: 1.2.5326
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.26100.1-240331-1435.ge-release
Windows version: 10.0.22635.3930
darrenchang commented 4 months ago

Fair points - I think an option to keep distros running (and not idle terminate them) is a good idea.

Are there still plans to implement this?

codeart1st commented 1 month ago

Small update for Windows 11 24H2 the waitfor.exe solution won't work anymore.

Job for keep-distro-alive.service failed because the control process exited with error code.
See "systemctl status keep-distro-alive.service" and "journalctl -xeu keep-distro-alive.service" for details.

I think it's a problem with the PreStart

/mnt/c/Windows/system32/waitfor.exe /si MakeDistroAlive
ERROR: Cannot send the specified signal.

I disabled the PreStart part for now.

hasancemcerit commented 1 month ago

This workaround that got inspiration from other posts such as this one, is guaranteed to be working.

I am running some dockerized application stack using compose on kali-linux. ▶ Prerequisite: systemd is enabled

  1. Create bash script keepalive.sh as below.
    
    #!/bin/bash
    # This script will run and make kali-linux distro running all the time.

check systemd and wait until it's up & running

while :; do [[ ! $(systemctl is-system-running --wait 2> /dev/null) ]] && sleep 1 || break; done

run your docker compose, or whatever you want to run on wsl

docker compose -f docker-compose.yml --env-file docker.env -p your-project up --no-deps --detach

check if tmux is in the background, if not start a new session

[[ ! $(tmux ls &> /dev/null) ]] && tmux new -d -s keepalive > /dev/null 2> /dev/null

check if infitinte tail is running, if not start

[[ ! $(ps aux | grep tail | grep -v grep) ]] && nohup sh -c "tail -f /dev/null &" < /dev/null > /dev/null 2> /dev/null

2. Create a scheduled task that runs on windows startup.

wsl.exe -d kali-linux --exec bash ./keepalive.sh


Enjoy that your app lives 💓 and will keep on living.

Test it and see for yourself from windows 👀 
$❯ `wsl -l -v`

NAME STATE VERSION