rancher-sandbox / rancher-desktop

Container Management and Kubernetes on the Desktop
https://rancherdesktop.io
Apache License 2.0
5.85k stars 274 forks source link

Updating certificates fails #2778

Open sascha-andres opened 2 years ago

sascha-andres commented 2 years ago

Actual Behavior

Rancher not starting

Steps to Reproduce

Start Rancher Desktop

Result

2022-08-19T07:27:00.315Z: Registered distributions: Ubuntu-20.04,rancher-desktop-data,rancher-desktop 2022-08-19T07:27:00.570Z: Registered distributions: Ubuntu-20.04,rancher-desktop-data,rancher-desktop 2022-08-19T07:27:01.006Z: Registered distributions: Ubuntu-20.04,rancher-desktop-data,rancher-desktop 2022-08-19T07:27:01.199Z: Registered distributions: Ubuntu-20.04,rancher-desktop-data,rancher-desktop 2022-08-19T07:27:01.199Z: data distro already registered 2022-08-19T07:27:16.466Z: Installing C:\Users\sascha.andres\AppData\Local\Programs\Rancher Desktop\resources\resources\linux\internal\trivy as /mnt/c/Users/sascha.andres/AppData/Local/Programs/Rancher Desktop/resources/resources/linux/internal/trivy into /usr/local/bin/trivy ... 2022-08-19T07:27:16.486Z: Installing C:\Users\sascha.andres\AppData\Local\Programs\Rancher Desktop\resources\resources\linux\internal\rancher-desktop-guestagent as /mnt/c/Users/sascha.andres/AppData/Local/Programs/Rancher Desktop/resources/resources/linux/internal/rancher-desktop-guestagent into /usr/local/bin//rancher-desktop-guestagent ... 2022-08-19T07:27:17.555Z: WSL: executing: /usr/sbin/update-ca-certificates: Error: wsl.exe exited with code 1

Expected Behavior

Rancher Desktop starting and usable

Additional Information

No response

Rancher Desktop Version

1.5.1

Rancher Desktop K8s Version

unknown

Which container engine are you using?

moby (docker cli)

What operating system are you using?

Windows

Operating System / Build Version

Windows 11

What CPU architecture are you using?

x64

Linux only: what package format did you use to install Rancher Desktop?

No response

Windows User Only

No response

sascha-andres commented 2 years ago

I found kind of a workaround: a complete factory reset. Not happy with that though

adamkpickering commented 2 years ago

Thanks for filing an issue! Is there any chance you have more details about the failure? It seems that a updating CA certificates inside a WSL distro failed, but it is tough to say what the problem is without more info. If you can get the error again, could you please do the following:

Thanks!

sascha-andres commented 2 years ago

Will do so tomorrow (I'm back in the office then)

sascha-andres commented 2 years ago

Actually the steps resulted in a completely unusable state:

image

After pressing OK the app is closed. After I stopped wsl completely I could remove the file and start it again.

Logs attached

logs.zip

adamkpickering commented 2 years ago

That's fishy. Were you running RD as an administrator by any chance? It must be run as a regular user otherwise weird stuff starts to happen. We have #1560 in progress for this.

sascha-andres commented 2 years ago

@adamkpickering sorry for the late reply, was sick. No, I was not using RD as an administrator. We have no administrative rights here

adamkpickering commented 2 years ago

So WSL is installed for you? That EPERM exception makes me wonder if IT has your system super locked down. Though I'm not very knowledgeable about Windows... @mook-as what is your take on this?

mook-as commented 2 years ago

The EPERM can occur if you somehow managed to start Rancher Desktop twice (because the previous instance has the file open, the new instance can't delete it).

It's unclear why running update-ca-certificates is failing, though; wsl-exec.log shows:

run-parts: /etc/ca-certificates/update.d/certhash: exit status 132

But it's unclear why that's happening. Would you be able to (once the failure has occurred) manually run update-ca-certificates in the rancher-desktop WSL distribution, and dig into the errors there?

cyron7 commented 1 year ago

I wanted to add that I am having the same issue as well: Actual Behavior Rancher not starting after reboot

Steps to Reproduce Start Rancher Desktop. Reboot the host machine while rancher is on. Try to start it again.

Result 2022-10-06T15:10:26.652Z: Registered distributions: rancher-desktop,rancher-desktop-data 2022-10-06T15:10:27.441Z: Registered distributions: rancher-desktop,rancher-desktop-data 2022-10-06T15:10:34.509Z: Registered distributions: rancher-desktop,rancher-desktop-data 2022-10-06T15:10:35.128Z: Registered distributions: rancher-desktop,rancher-desktop-data 2022-10-06T15:10:35.129Z: data distro already registered 2022-10-06T15:10:40.745Z: Did not find a valid mount, mounting /mnt/wsl/rancher-desktop/run/data 2022-10-06T15:11:39.074Z: Installing C:\Users\kyle.andrews\AppData\Local\Programs\Rancher Desktop\resources\resources\linux\internal\rancher-desktop-guestagent as /mnt/c/Users/kyle.andrews/AppData/Local/Programs/Rancher Desktop/resources/resources/linux/internal/rancher-desktop-guestagent into /usr/local/bin//rancher-desktop-guestagent ... 2022-10-06T15:11:39.418Z: Installing C:\Users\kyle.andrews\AppData\Local\Programs\Rancher Desktop\resources\resources\linux\internal\trivy as /mnt/c/Users/kyle.andrews/AppData/Local/Programs/Rancher Desktop/resources/resources/linux/internal/trivy into /usr/local/bin/trivy ... 2022-10-06T15:11:48.747Z: WSL: executing: /usr/sbin/update-ca-certificates: Error: wsl.exe exited with code 1

Expected Behavior Rancher Desktop starting and usable

Additional Information No response

Rancher Desktop Version 1.5.1

Rancher Desktop K8s Version unknown

Which container engine are you using? moby (docker cli) or containerd

What operating system are you using? Windows

Operating System / Build Version Windows 10 Enterprise

What CPU architecture are you using? x64

Linux only: what package format did you use to install Rancher Desktop? No response

Windows User Only No response

cyron7 commented 1 year ago

This is the error I get as rancher is starting up: image

cyron7 commented 1 year ago

Recent Log file lines: 2022-10-06T15:39:47.467Z: Running: wsl.exe --distribution rancher-desktop --exec busybox chmod 755 /etc/init.d/dnsmasq-generate 2022-10-06T15:39:47.649Z: Running: wsl.exe --distribution rancher-desktop --exec busybox chmod 644 /etc/conf.d/cri-dockerd 2022-10-06T15:39:47.913Z: Running: wsl.exe --distribution rancher-desktop --exec busybox chmod 644 /etc/conf.d/containerd 2022-10-06T15:39:48.007Z: Running: wsl.exe --distribution rancher-desktop --exec busybox chmod 644 /etc/logrotate.d/k3s 2022-10-06T15:39:48.624Z: Running: wsl.exe --distribution rancher-desktop --exec /sbin/rc-update add host-resolver default 2022-10-06T15:39:49.001Z: WSL: executing: /usr/sbin/update-ca-certificates: Error: wsl.exe exited with code 1 2022-10-06T15:39:49.479Z: Running: wsl.exe --distribution rancher-desktop --exec mkdir -p /etc/cni/net.d 2022-10-06T15:39:49.548Z: Capturing output: wsl.exe --distribution rancher-desktop --exec wslpath -a -u C:\Users\kyle.andrews\AppData\Local\Temp\rd-docker-7MqAJp\docker 2022-10-06T15:39:49.811Z: Running: wsl.exe --distribution rancher-desktop --exec /sbin/rc-update add dnsmasq default

cyron7 commented 1 year ago

wsl-exec.log

mook-as commented 1 year ago

/etc/ca-certificates/update.d/certhash: exit status 132

# cat /etc/ca-certificates/update.d/certhash
#!/bin/sh
exec /usr/bin/c_rehash /etc/ssl/certs

Alpine has their own c_rehash; it's not clear how that can exist with 132 — it looks like it returns either 2 or 0. It's also possible it's actually (128 + 4), in which case it's SIGILL… which also doesn't make much sense.

Please try running /usr/bin/c_rehash -v /etc/ssl/certs manually and see if it produces more details?

Out of curiosity, what CPU do you have? Not that I really expect a missing instruction there… The other option I see is corruption (either disk image or memory).

cyron7 commented 1 year ago

See log for results of command '/usr/bin/c_rehash -v /etc/ssl/certs' run from within 'rancher-desktop'

c_rehash.log

cyron7 commented 1 year ago

CPU: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz, 2112 Mhz, 4 Core(s), 8 Logical Processor(s)

mook-as commented 1 year ago

Illegal instruction

Well, that's interesting! c_rehash (for 1.5.1) has a sha256 hash of 3AD730F1AE440CAE63D0C4E5EECFB3A69318E5FABE2E10B16BBCDA81735A8E7C for me. Is yours any different?

That CPU shouldn't be missing any actual instructions, as far as I know…

jandubois commented 1 year ago

@cyron7 Could you run /usr/bin/openssl rehash -v /etc/ssl/certs to see if it fails the same way?

Also, could you attach the output of cat /proc/cpuinfo? Your CPU should have all required features, so I don't really understand how this can happen.

cyron7 commented 1 year ago

@jandubois ; I am away from that machine. I should be able to get that output to you in the next 24 hours.

cyron7 commented 1 year ago

@mook-as ; I'm not sure how to check that. I'll to to figure that out and get back with you.

jandubois commented 1 year ago

I'm not sure how to check that. I'll to to figure that out and get back with you.

Some options:

lima-rancher-desktop:~$ sha256sum /usr/bin/c_rehash
3ad730f1ae440cae63d0c4e5eecfb3a69318e5fabe2e10b16bbcda81735a8e7c  /usr/bin/c_rehash
lima-rancher-desktop:~$ openssl dgst -sha256 /usr/bin/c_rehash
SHA256(/usr/bin/c_rehash)= 3ad730f1ae440cae63d0c4e5eecfb3a69318e5fabe2e10b16bbcda81735a8e7c

@mook-as shows an uppercase hash, so not sure which command he ran. 😄

cyron7 commented 1 year ago

@jandubois ; It looks like that is missing: image

cyron7 commented 1 year ago

@jandubois ; Here is the information from cpuinfo: cpuinfo.log

cyron7 commented 1 year ago

@jandubois and @mook-as ; This was the output from running the rehash: image

Hash: 3ad730f1ae440cae63d0c4e5eecfb3a69318e5fabe2e10b16bbcda81735a8e7c /usr/bin/c_rehash

jandubois commented 1 year ago

Here is the information from cpuinfo:

Thanks! That all looks as expected (VM settings match your host CPU), and all the features like sse4_2 and avx are there, so not giving any clue...

cyron7 commented 1 year ago

Is there any network requirements for Rancher to be able to start up or for the update-ca-certificates to work?

jandubois commented 1 year ago

Is there any network requirements for Rancher to be able to start up or for the update-ca-certificates to work?

No, it should work fine without network. Of course you will need a network connection to download images, and to fetch the Kubernetes version you want to run, but if they already exist locally, then you can run offline.

jandubois commented 1 year ago

Ok, maybe test just an empty Alpine distro and see if that already fails, or if the issue is with stuff that gets installed later.

Can you run these commands and show if they succeed or fail:

PS C:\Users\Jan> wsl --import testing . '.\AppData\Local\Programs\Rancher Desktop\resources\resources\win32\distro-0.27.tar'
PS C:\Users\Jan> wsl -d testing update-ca-certificates
WARNING: ca-certificates.crt does not contain exactly one certificate or CRL: skipping
PS C:\Users\Jan> wsl --unregister testing
Unregistering...

The distro-0.27 filename assumes that you are trying Rancher Desktop 1.6.0 now. If you are still on 1.5.1 then the version should be 0.26.

If you get a failure from this test too (and are still on 1.5.1), please uninstall it and install 1.6.0 and try again. I kind of doubt that there are any differences, but the time has come for desperate actions...

cyron7 commented 1 year ago

@jandubois ; Sorry I took so long getting back to you. I got the same error. I am using 1.5.1. I will update to 1.6.0 and see if that solves the problem: image

cyron7 commented 1 year ago

@jandubois ; I uninstalled 1.5.1 and installed 1.6.0. The same symptoms happened where I can initially get Rancher to start but after I shutdown and start up my host machine I get the update-ca-certificates error. I tried the command you suggested on 1.6.0 and got the same error with distro 27 as I did with 26: image

jandubois commented 1 year ago

This shows that the error is not triggered with just the builtin certs from the distro, so I think this means it is related to one of the certs on your host that is being copied into the distro as Rancher Desktop starts up.

@mook-as Do you have any idea how to isolate the cert that may be triggering this?

cyron7 commented 1 year ago

As a slightly separate topic, I noticed that after the error, Rancher doesn't notice that this blocked the daemon from running: image

jandubois commented 1 year ago

As a slightly separate topic, I noticed that after the error, Rancher doesn't notice that this blocked the daemon from running:

Yes, we just implemented the framework for providing diagnostics; how to alert the user, and how to manage them. The 1.6.0 release only contains a handful (quite literally) of diagnostics, mostly related to PATH configuration. We will expand the set of diagnostics over time.

cyron7 commented 1 year ago

@mook-as and @jandubois . One thing I did last week is I deleted all of the certificates out of '/usr/local/share/ca-certificates' and it worked until I restarted my computer again. Then I couldn't get it to work again by deleting those same certificates.

mook-as commented 1 year ago

Hmm, other than the output from /usr/bin/c_rehash -v /etc/ssl/certs (which you've already done) I don't have a better idea to isolate the failure except to binary search (delete half the certs in /usr/local/share/ca-certificates, see if that fails, and repeat until you narrow it down to one file).

cyron7 commented 1 year ago

@mook-as I did that initially and thought it was certs 300 - 399. Then I noticed that when I removed all of them except cert 0 it would work. Then it would rebuild them (the certificates) all and Rancher Desktop would crash. That's where I got the idea of just removing them all. Thinking that it could be a potential workaround or clue to what is happening. Now it doesn't matter what certificates I remove. I know there was an idea proposed by @adamkpickering that maybe something on my machine was so locked down that it was breaking Rancher. I don't know if there is something one of you want me to try to see if something is indeed blocking Rancher.

cyron7 commented 1 year ago

I wanted to mention that I updated to version 1.6.1 and the issue is still there: image

craigforr commented 1 year ago

I have the exact same issue on Windows 10...

Windows Specifications

Edition:      Windows 10 Enterprise
Version:      22H2
Installed on: 12/22/2021
OS build:     19045.2251
Experience:   Windows Feature Experience Pack 120.2212.4180.0
Architecture: x64

Rancher Desktop Specifications

Version:            1.6.2
Container Engine:   dockerd (moby)
Kubernetes Version: N/A (Kubernetes disabled)

Error

2022-12-14T15:38:36.167Z: Registered distributions: rancher-desktop
2022-12-14T15:38:41.867Z: /sbin/init exited gracefully.
2022-12-14T15:38:42.542Z: Registered distributions: Ubuntu,rancher-desktop,Ubuntu-20.04,Ubuntu-18.04
2022-12-14T15:38:43.863Z: Registered distributions: Ubuntu,rancher-desktop-data,rancher-desktop,Ubuntu-20.04,Ubuntu-18.04
2022-12-14T15:38:44.781Z: Registered distributions: Ubuntu,rancher-desktop-data,rancher-desktop,Ubuntu-20.04,Ubuntu-18.04
2022-12-14T15:38:45.760Z: Registered distributions: Ubuntu,rancher-desktop-data,rancher-desktop,Ubuntu-20.04,Ubuntu-18.04
2022-12-14T15:38:45.760Z: data distro already registered
2022-12-14T15:39:15.168Z: Installing C:\Users\user1\AppData\Local\Programs\Rancher Desktop\resources\resources\linux\internal\trivy as /mnt/c/Users/user1/AppData/Local/Programs/Rancher Desktop/resources/resources/linux/internal/trivy into /usr/local/bin/trivy ...
2022-12-14T15:39:18.400Z: WSL: executing: /usr/sbin/update-ca-certificates: Error: wsl.exe exited with code 1

Last Command

The last command run is almost invariably listed as:

wsl.exe --distribution rancher-desktop --exec /usr/sbin/update-ca-certificates

Background

I have been getting this error consistently beginning this week. The only work-around that I have found which will get RD to start successfully is a Factory Reset.

The odd thing is that the error is listed as "Error Starting Kubernetes" and I get the error whether Kubernetes is enabled or not.

I have stopped Rancher Desktop and saved all of the logs from this point in time if you would like additional details from the logs, @jandubois.

Craig