vdsm / virtual-dsm

Virtual DSM in a Docker container.
MIT License
2.54k stars 339 forks source link

How to solve error "Failed to cpio /storage/dsm.rd, reason 2" #323

Closed mr-sab closed 11 months ago

mr-sab commented 1 year ago

Testen on wsl2 ubuntu windows and under fedora 38 both with podman

podman run -it --rm -p 5000:5000 --device=/dev/kvm --cap-add NET_ADMIN --stop-timeout 60 kroese/virtual-dsm:latest

❯ Starting Virtual DSM for Docker v4.12... ❯ Install: Downloading installer... ❯ ERROR: Failed to cpio /storage/dsm.rd, reason 2

thank you!

kroese commented 1 year ago

Really strange, I never seen this error before. I think it is related to Podman somehow, and this container is only tested for usage with Docker.

Maybe Podman needs some extra permissions/capabilities to be set for the container? Can you try running the container with the --privileged flag set? If that works maybe we can figure out what permission it is missing.

mr-sab commented 1 year ago

I used tried and this is giving me the same error

kroese commented 1 year ago

Can you try if it works with Docker (by installing Docker Desktop for WSL2)? So that we know if this is a problem caused by Podman or that its related to something else?

mr-sab commented 1 year ago

looks like indeed a issue with podman, installed docker on fedora and now it is continuing.

kroese commented 1 year ago

I have no explanation why it does not work with Podman, as this cpio that fails is just a simple program for extracting files. So it should behave exactly the same regardless if its running under Podman or Docker. So maybe this is a bug in Podman or some kind of permissions missing.

In any case, this project is called "VirtualDSM for Docker" so unless someone can tell us how to fix this so that it works in Podman too, I am just accepting that it doesn't.

kroese commented 1 year ago

Since this issue occurs during installation, and this installation is only done once, a quick workaround could be that you keep the /storage folder after you did the installation in Docker. And then you can remove Docker again, and use Podman by pointing it to the same /storage folder.

This way Podman will skip the installation (since the files are already available) and you should be able to use the container in Podman.

mr-sab commented 1 year ago

yes it is indeed a bit weird, is a debug option available to give me a bit more information? I want to look into it, but with the current information it is very hard

kroese commented 1 year ago

You can set the environment variable DEBUG=Y and the script will output some more info, but during installation thats mainly a dump of every line of code its executing, so Im afraid it will not help much in diagnosing the problem.

mr-sab commented 1 year ago

yes that didn't change a lot. So I cloned you repo and removed the 2>/dev/null from the cpio command in the install.sh script I looks like a permission issue indeed, lets me see of I can fix this, will create a PR if I have a fix 👍

cpio: dev/console: Cannot mknod: Operation not permitted cpio: dev/net/tun: Cannot mknod: Operation not permitted 36797 blocks

kroese commented 1 year ago

If it cannot extract those two directories it doesnt seem important because they are just /dev folders. So maybe if you ignore error checking (by setting set +e before the call to cpio and setting set -e after the call) it will continue and DSM will work just fine without them?

mr-sab commented 1 year ago

Tried that but didn't work... I am now playing around with fakeroot but this is needed on a lot of places so not sure if this will work.

0n1cOn3 commented 1 year ago

Same issue here with Debian and Docker-Compose

Fresh Setup

Volume points to the attached external harddrive. (I smell permission issue - but what needs to be adjusted? UID:GID?)

Thats the compose file:

_version: "3" services: vDSM: container_name: vDSM image: kroese/virtual-dsm:latest environment: DISK_SIZE: "16G" devices:

0n1cOn3 commented 1 year ago

I was able to fix it.

The culprit was the filesystem on the external storage. Exfat does obviously not work. So I reformatted the drive and boom - Its booting.

@mr-sab Can you verify that your location where you store the /storage drive if it uses ext4 or similar ?

mr-sab commented 1 year ago

Nice! Good find.

I will try this out tonight and report back

kroese commented 1 year ago

I created a new release (v4.13) which extracts all files to the containers internal filesystem during installation, instead of to the /storage folder.

This should fix the issue of not being able to install when /storage is on a filesystem that does not support Unix permissions (like exFAT).

mr-sab commented 1 year ago

My laptop fedora laptop btrfs, with 4.13 and latest I get the same error...

kroese commented 1 year ago

Okay at least it should fix the problem @0n1cOn3 had with exFAT, but maybe yours is caused by something else..

kroese commented 10 months ago

@mr-sab The latest version contains a flag to instruct cpio to skip creating device nodes (DEV=N). This was done to support unprivileged LXC containers, but it should solve your problem with Podman too.

mr-sab commented 10 months ago

Thank you for the update. It looks like it not working for me at this point. let me dive into this when i have some spare time

podman run -it --rm -p 5000:5000 --device=/dev/kvm -e DEV=N --privileged  --cap-add NET_ADMIN --stop-timeout 60 vdsm/virtual-dsm:latest
Device "eth0" does not exist.
❯ ERROR: Status 1 while: cut -f1 -d/ (line 150/253)
❯ ERROR: Status 1 while: IP=$(ip address show dev "${VM_NET_DEV}" | grep inet | awk '/inet / { print $2 }' | cut -f1 -d/) (line 150/253)

podman run -it --rm -p 5000:5000 --device=/dev/kvm -e DEV=N --privileged --network=host  --cap-add NET_ADMIN --stop-timeout 60 vdsm/virtual-dsm:latest
RTNETLINK answers: Operation not permitted
❯ ERROR: Capability NET_ADMIN has not been set most likely. Please add the 
❯ ERROR: following docker setting to your container: --cap-add NET_ADMIN

podman run -it --rm -p 5000:5000 --device=/dev/kvm -e DEV=N   --cap-add NET_ADMIN --stop-timeout 60 vdsm/virtual-dsm:latest
mknod: /dev/net/tun: Operation not permitted
❯ ERROR: Status 1 while: mknod /dev/net/tun c 10 200 (line 209/12)

podman run -it --rm -p 5000:5000 --device=/dev/kvm -e DEV=N --privileged --network=host  --stop-timeout 60 vdsm/virtual-dsm:latest
RTNETLINK answers: Operation not permitted
❯ ERROR: Capability NET_ADMIN has not been set most likely. Please add the 
❯ ERROR: following docker setting to your container: --cap-add NET_ADMIN
kroese commented 10 months ago

Before, it already went wrong during installation. And now it goes wrong during the network setup, so we are already one step further :)

The container only supports bridge-networking (default) or macvlan networking, so that your tests with --network=host fail is to be expected. Because I never implemented any support for that mode.

I see that with bridge networking it fails on:

Device "eth0" does not exist.

From what I can find about podman it seems that it could be solved by adding: --cap-add NET_RAW in addition to NET_ADMIN. So maybe you can it a try with both of these capabilities enabled?

mr-sab commented 10 months ago

yes that is true :)

I try it the extra --cap-add option, this didn't fix it. I got it running with a extra "-e VM_NET_DEV=tap0" but was not able to get to the webui. Looks like the eth0 is really needed let me see if i can add this a other way

Thanks again :)

Device "eth0" does not exist.
❯ ERROR: Status 1 while: cut -f1 -d/ (line 150/253)
❯ ERROR: Status 1 while: IP=$(ip address show dev "${VM_NET_DEV}" | grep inet | awk '/inet / { print $2 }' | cut -f1 -d/) (line 150/253)
kroese commented 9 months ago

@mr-sab In the latest version v5.11 I now added code to automaticly detect the default interface instead of hardcoding eth0. So maybe this helps in your case.

mr-sab commented 9 months ago

Thank you for the update, but it is not working. It boots up but I am not able to connect to the webui.


❯ -----------------------------------------------------------
❯  You can now login to DSM at http://10.0.2.100:5000
❯ -----------------------------------------------------------
kroese commented 9 months ago

Okay! Most likely it has something to do with tap0 already being a tap-interface. So it will create another tap on top of that tap and maybe you cannot stack them like that.