Open k1n6b0b opened 4 years ago
Looks like I'm having the same/similar issue. I have a 3 node k3os cluster and have installed Rancher with Helm 3. However, I'm using LetsEncrypt for my certificate. I was able to get rancher up long enough to pull a certificate with cert-manager and was able to log in to the UI. Very shortly after everything crashed. After rebooting all of my nodes I'm able to access the cluster but the Rancher pods all fail to start. Some of the errors I'm seeing seem DB related.
K3s Host
E0526 01:28:09.324679 2493 pod_workers.go:191] Error syncing pod 5c449ad0-eff1-4733-a9f8-eab5a4c47e14 ("rancher-66b5cfc7f5-mkq4p_cattle-system(5c449ad0-eff1-4733-a9f8-eab5a4c47e14)"), skipping: failed to "StartContainer" for "rancher" with CrashLoopBackOff: "back-off 5m0s restarting failed container=rancher pod=rancher-66b5cfc7f5-mkq4p_cattle-system(5c449ad0-eff1-4733-a9f8-eab5a4c47e14)"
time="2020-05-26T01:28:14.688340754Z" level=error msg="failed to record compact revision: database is locked"
E0526 01:30:39.121645 5199 pod_workers.go:191] Error syncing pod 66d84ddb-d7df-434c-befa-8aad17c32b2c ("cattle-cluster-agent-8497bbc7cc-dpw4j_cattle-system(66d84ddb-d7df-434c-befa-8aad17c32b2c)"), skipping: failed to "StartContainer" for "cluster-register" with CrashLoopBackOff: "back-off 5m0s restarting failed container=cluster-register pod=cattle-cluster-agent-8497bbc7cc-dpw4j_cattle-system(66d84ddb-d7df-434c-befa-8aad17c32b2c)"
time="2020-05-26T01:30:46.905084419Z" level=error msg="error in txn: database is locked"
E0526 01:30:46.905381 5199 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"database is locked", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
E0526 01:30:46.905965 5199 autoregister_controller.go:194] v1.monitoring.coreos.com failed with : rpc error: code = Unknown desc = database is locked
Rancher Pod
E0526 01:38:22.729275 7 reflector.go:153] github.com/rancher/norman/controller/generic_controller.go:229: Failed to list *v3.ClusterTemplate: the server could not find the requested resource (get clustertemplates.meta.k8s.io)
E0526 01:38:22.731363 7 reflector.go:153] github.com/rancher/norman/controller/generic_controller.go:229: Failed to list *v3.CisBenchmarkVersion: the server could not find the requested resource (get cisbenchmarkversions.meta.k8s.io)
E0526 01:38:22.731887 7 reflector.go:153] github.com/rancher/norman/controller/generic_controller.go:229: Failed to list *v3.CatalogTemplate: the server could not find the requested resource (get catalogtemplates.meta.k8s.io)
@dweomer How does this get assigned/noticed? I'd love to leverage this platform, but i need it to work š¬ I'm happy to help, debug, provide info -- my skillsets arent in programming, but I can build/test systems
Getting similar instability with a dqlite setup, with K3os as well as K3s on Ubuntu 18.04, both installed manually and with k3supĀ. Getting varied results but cannot get a stable workaround.
On K3os, able to setup a cluster and install Rancher but it never deploys 3/3.
On Ubuntu, I get "database is locked" messages for cert-manager and Rancher, cannot get a working install
At one point I got Rancher working but it was prompting me for my password while I hadn't set it yet.
If I install Rancher on one K3s and/or K3os node, it works fine, but then I can't add more nodes.
Version (k3OS / kernel)
Architecture
3x hosts:
General cloud-config:
subsequent hosts are joined with
"--server=https://k3os-1.k3s.[REDACTED]:6443"
Describe the bug I've installed k3os multiple times using the internal HA db and each time have ended up with a corrupted install after deploying rancher.
Error from server: rpc error: code = Unknown desc = failed to create dqlite connection: no available dqlite leader server found
To Reproduce
Expected behavior
Actual behavior
Also receiving a lot of TLS errors (Not sure if they are related)
Additional context See log files: putty-k3os-1.k3s.REDACTED.log putty-k3os-2.k3s.REDACTED.log putty-k3os-3.k3s.REDACTED.log