rancher-sandbox / rancher-desktop

Container Management and Kubernetes on the Desktop
https://rancherdesktop.io
Apache License 2.0
5.84k stars 272 forks source link

rancher-desktop fails to start on Linux for specific user #1760

Open zbabac opened 2 years ago

zbabac commented 2 years ago

Actual Behavior

This issue is related to #1298 which is closed after updating rancher-desktop to stable 1.0.0. However, it appeared again after update to v1.1.1 (it was completely removed for testing purposes and then reinstalled via apt install rancher-desktop). It starts normally, I select kubernetes version, downloads k3s, then it displays error:

time="2022-03-08T11:19:50+01:00" level=info msg="Terminal is not available, proceeding without opening an editor"
time="2022-03-08T11:19:50+01:00" level=info msg="Attempting to download the image from \"/opt/rancher-desktop/resources/resources/linux/alpine-lima-v0.2.8-rd-3.14.3.iso\"" digest=
time="2022-03-08T11:19:50+01:00" level=fatal msg="failed to download the image, attempted 1 candidates, errors=[failed to download \"/opt/rancher-desktop/resources/resources/linux/alpine-lima-v0.2.8-rd-3.14.3.iso\": copy file range failed: invalid argument]"
2022-03-08T10:19:50.058Z: + limactl start --tty=false /home/jokke/.local/share/rancher-desktop/lima/_config/0.yaml
2022-03-08T10:19:50.059Z: Error: /opt/rancher-desktop/resources/resources/linux/lima/bin/limactl exited with code 1
2022-03-08T10:19:50.059Z: Error starting lima: Error: /opt/rancher-desktop/resources/resources/linux/lima/bin/limactl exited with code 1
    at ChildProcess.<anonymous> (/opt/rancher-desktop/resources/app.asar/dist/app/background.js:1:8692)
    at ChildProcess.emit (node:events:394:28)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:290:12)

I close the window and start again and then it takes 10 minutes with message "wait to start virtual machine". After timeout it displays the error message:

time="2022-03-08T11:00:40+01:00" level=debug msg="[hostagent] executing ssh for script \"ssh\": /usr/bin/ssh [ssh -F /dev/null -o IdentityFile=\"/home/jokke/.local/share/rancher-desktop/lima/_config/user\" -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o NoHostAuthenticationForLocalhost=yes -o GSSAPIAuthentication=no -o PreferredAuthentications=publickey -o Compression=no -o BatchMode=yes -o IdentitiesOnly=yes -o Ciphers=\"^aes128-gcm@openssh.com,aes256-gcm@openssh.com\" -o User=jokke -o ControlMaster=auto -o ControlPath=\"/home/jokke/.local/share/rancher-desktop/lima/0/ssh.sock\" -o ControlPersist=5m -p 35405 127.0.0.1 -- /bin/bash]"
time="2022-03-08T11:00:40+01:00" level=debug msg="[hostagent] stdout=\"\", stderr=\"jokke@127.0.0.1: Permission denied (publickey,password,keyboard-interactive).\\r\\n\", err=failed to execute script \"ssh\": stdout=\"\", stderr=\"jokke@127.0.0.1: Permission denied (publickey,password,keyboard-interactive).\\r\\n\": exit status 255"
time="2022-03-08T11:00:44+01:00" level=fatal msg="did not receive an event with the \"running\" status"
2022-03-08T10:00:44.471Z: + limactl --debug start --tty=false 0
2022-03-08T10:00:44.471Z: Error: /opt/rancher-desktop/resources/resources/linux/lima/bin/limactl exited with code 1
2022-03-08T10:00:44.472Z: Error starting lima: Error: /opt/rancher-desktop/resources/resources/linux/lima/bin/limactl exited with code 1
    at ChildProcess.<anonymous> (/opt/rancher-desktop/resources/app.asar/dist/app/background.js:1:8692)
    at ChildProcess.emit (node:events:394:28)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:290:12)

When I create new user, logout and then login as new user, the rancher-desktop starts normally.

It is obviously related to settings for my user, but which one? I deleted everything related to rancher-desktop from home dir: .cache, .config, .local, .kube. It then starts from downloading k3s and the same repeats.

I tried to use AppImage, but the result is the same. It is not reproducible per se, I will try to copy entire home to the new user and report with update.

Question is: is there anything else related to rancher-desktop that can interfere with user settings, i.e. what components are used for rancher-desktop? Could it be some user rights (permission denied implies that, but what permission) lima.log lima.ha.stderr.log

?

Steps to Reproduce

Not reproducible for other user.

Result

time="2022-03-08T11:00:40+01:00" level=debug msg="[hostagent] executing ssh for script \"ssh\": /usr/bin/ssh [ssh -F /dev/null -o IdentityFile=\"/home/jokke/.local/share/rancher-desktop/lima/_config/user\" -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o NoHostAuthenticationForLocalhost=yes -o GSSAPIAuthentication=no -o PreferredAuthentications=publickey -o Compression=no -o BatchMode=yes -o IdentitiesOnly=yes -o Ciphers=\"^aes128-gcm@openssh.com,aes256-gcm@openssh.com\" -o User=jokke -o ControlMaster=auto -o ControlPath=\"/home/jokke/.local/share/rancher-desktop/lima/0/ssh.sock\" -o ControlPersist=5m -p 35405 127.0.0.1 -- /bin/bash]"
time="2022-03-08T11:00:40+01:00" level=debug msg="[hostagent] stdout=\"\", stderr=\"jokke@127.0.0.1: Permission denied (publickey,password,keyboard-interactive).\\r\\n\", err=failed to execute script \"ssh\": stdout=\"\", stderr=\"jokke@127.0.0.1: Permission denied (publickey,password,keyboard-interactive).\\r\\n\": exit status 255"
time="2022-03-08T11:00:44+01:00" level=fatal msg="did not receive an event with the \"running\" status"
2022-03-08T10:00:44.471Z: + limactl --debug start --tty=false 0
2022-03-08T10:00:44.471Z: Error: /opt/rancher-desktop/resources/resources/linux/lima/bin/limactl exited with code 1
2022-03-08T10:00:44.472Z: Error starting lima: Error: /opt/rancher-desktop/resources/resources/linux/lima/bin/limactl exited with code 1
    at ChildProcess.<anonymous> (/opt/rancher-desktop/resources/app.asar/dist/app/background.js:1:8692)
    at ChildProcess.emit (node:events:394:28)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:290:12)

Expected Behavior

Expected to work for any user.
When uninstalled and reinstalled, expected to start normally, perform initial setup and then start VM and k3s inside VM.

Additional Information

No response

Rancher Desktop Version

1.1.1

Rancher Desktop K8s Version

1.22.5

Which container runtime are you using?

containerd (nerdctl)

What operating system are you using?

Other Linux

Operating System / Build Version

KDE neon 20.04 5.24

What CPU architecture are you using?

x64

Linux only: what package format did you use to install Rancher Desktop?

deb

Windows User Only

No response

adamkpickering commented 2 years ago

To me, the following line is the most interesting:

time="2022-03-08T11:19:50+01:00" level=fatal msg="failed to download the image, attempted 1 candidates, errors=[failed to download \"/opt/rancher-desktop/resources/resources/linux/alpine-lima-v0.2.8-rd-3.14.3.iso\": copy file range failed: invalid argument]"

This seems to point to this being an issue with lima. It seems to pertain to a copy_file_range(2) syscall. I took a brief look through some of lima's code and didn't find anything obvious. I don't have a ton of time right now to look at this, but I'll continue looking in the future.

What distro and filesystem are you running?

zbabac commented 2 years ago

KDE neon 20.04 5.24 (based on Ubuntu 20.04) and ext4.
So far WA is to start RD as other user and then to switch to my normal user (the caveat is to copy user key to the $HOME/.kube/.

zbabac commented 2 years ago

For the sake of helping anyone else who might have the same issue, here is the detailed WA procedure.
The problem appeared again few days ago after KDE update (to 5.24.4), so it might be related to KDE. Anyway, it messes something with the user,and the only remedy is to create new, vanilla user, which is used to start rancher-desktop. Your normal user can then use started VM services from his/hers normal desktop session.

  1. create another user, let's call it rancheruser and switch your KDE desktop session to the new rancheruser;
  2. start rancher-desktop, (first time only: select kubernetes version, download images, set cpus and memory, etc.);
  3. make sure the rancher-desktop icon is blue and keep it running and switch back to your normal user (CTRL+ALT+F1 is for fast switching, switching back to rancher is CTRL+ALT+F2, or F3, depending how many session you have open);
  4. one time only: copy kube config from the rancheruser home to your user home
    sudo cp /home/rancheruser/.kube/config /home/<user>/.kube/config
  5. (optional) if you need access to the lima VM you need to know the port where it is running:
    ps axf|grep ssh # and note the port for eg. 39309
    ssh -i /home/rancheruser/.local/share/rancher-desktop/lima/_config/user rancheruser@127.0.0.1 -p 39309 -L 443:192.168.5.15:443

Only first 3 steps are necessary every time you start rancher-desktop, 4. is one-time only, 5. if you need access to lima VM.

If you want to stop rancher-desktop, switch back to rancheruser and close it (or logout user). You will need to repeat 1-3 again in order to start it (and optionally 5).

If it happens that you can't start rancher-desktop even with new user after some update for example, just create new user and use it further until this issue is resolved.