skiffos / SkiffOS

Any Linux distribution, anywhere.
https://skiffos.com
MIT License
700 stars 52 forks source link

WSL: timed out waiting for init process to start #213

Closed clayauld closed 2 years ago

clayauld commented 2 years ago

After building SkiffOS for WSL and importing the tar.gz archive according to the guide, the system will not start. The error are shown in the screenshot.

I have confirmed that I imported the archive as WSL version 2.

image

clayauld commented 2 years ago

I should add that this is built from the latest release version.

paralin commented 2 years ago

WSL has evolved enough now that the skiff config will need to be changed a bit. Probably there's a better way now to start systemd than the workaround used here.

I'm going to have to take a deeper look at this and see the best way to address this issue.

clayauld commented 2 years ago

Any updates here? Wondering if there's anything I can help with here. I'd like to get a version of this in WSL for testing purposes.

paralin commented 2 years ago

Hey @clayauld I'll boot up my windows machine and try to find a fix this evening.

clayauld commented 2 years ago

Let me know what I can do to help!

paralin commented 2 years ago

I'm not sure about today, but in the past it was not possible to set a process as the "init process" in WSL2, so the workaround in skiff is to start systemd as a side effect of the first command line invocation (wsl.exe)

I'm still looking into a solution to this but managed to get the windows machine all updated & will continue looking tomorrow & prioritize it this week

clayauld commented 2 years ago

Interesting digging into this.... technically WSL doesn't officially support systemd and most WSL distros don't use it. Here's a screenshot from my favorite, Pengwin.

image

However, there is this: https://github.com/arkane-systems/genie

This makes me think that full compatibility with WSL would require reworking SkiffOS for WSL to remove systemd as a dependency and start the services individually. But......that seems like a LOT of work and likely to break many things.

paralin commented 2 years ago

No, it doesn't need to be reworked. The current setup was working in the past to start up systemd the first time wsl starts. I just need to adjust the script slightly for recent changes of wsl.

paralin commented 2 years ago

According to https://docs.microsoft.com/en-us/windows/wsl/wsl-config#boot-settings

/etc/wsl.conf - you can specify [boot] command with an initial process to run on startup.

in this case we use /boot/skiff-init/skiff-init-squashfs

The startup process is:

  1. wsl.exe
  2. WSL starts /bin/bash which is actually wsl-shell (C code in this repo)
  3. wsl-shell waits for a pid file to exist
  4. skiff-init-squashfs mounts the squashfs & overlayfs & chroots into it
  5. skiff-init-squashfs starts /wsl-init.sh inside the chroot
  6. skiff-init.sh sets up some mounts & starts systemd
  7. skiff-init-squashfs write the systemd pid to the file
  8. wsl-shell sees the pid file & enters the namespaces of systemd

So it's a bit complicated but it worked reliably previously. I suspect that some changes made to skiff-init-squashfs broke it somewhere along the way.

To debug this, I'll undo the symlink to wsl-shell to have it instead run stock bash, then run /boot/skiff-init/skiff-init-squashfs manually and look at the logs to see why it is failing.

Will try to do this today.

paralin commented 2 years ago

So I see here that it says The Boot setting is only available on Windows 11.

One possible workaround is to check if boot ran or not, and if not, instead start it using wsl-shell on first run

paralin commented 2 years ago

224 has that workaround, needs more testing

clayauld commented 2 years ago

I've been swamped with other things but I'll test #224 when I can.