loft-sh / vcluster

vCluster - Create fully functional virtual Kubernetes clusters - Each vcluster runs inside a namespace of the underlying k8s cluster. It's cheaper than creating separate full-blown clusters and it offers better multi-tenancy and isolation than regular namespaces.
https://www.vcluster.com
Apache License 2.0
6.28k stars 399 forks source link

vcluster remains in pending after creation, then enters CrashLoopBackOff #591

Closed vivian-hafener-lanl closed 2 years ago

vivian-hafener-lanl commented 2 years ago

What happened?

After creating a vcluster using vcluster create csh-vcluster-01 --debug, the setup process hangs here until the command times out:

root@k8s-ctrl01-nrh:~# vcluster create csh-vcluster-01 --debug
debug  Will use namespace vcluster-csh-vcluster-01 to create the vcluster...
info   Creating namespace vcluster-csh-vcluster-01
info   Create vcluster csh-vcluster-01...
debug  execute command: helm upgrade csh-vcluster-01 https://charts.loft.sh/charts/vcluster-0.10.2.tgz --kubeconfig /tmp/3510279876 --namespace vcluster-csh-vcluster-01 --install --repository-config='' --values /tmp/2170583696
done √ Successfully created virtual cluster csh-vcluster-01 in namespace vcluster-csh-vcluster-01
info   Waiting for vcluster to come up...

Error messages are not verbose enough for me to figure out what exactly causes this to hang. Once this process is either interrupted via keyboard interrupt or by letting it time out, the following is visible when the command vcluster list is run:

root@k8s-ctrl01-nrh:~# vcluster list

 NAME              NAMESPACE                  STATUS    CONNECTED   CREATED                         AGE
 csh-vcluster-01   vcluster-csh-vcluster-01   Pending               2022-07-10 21:51:02 -0400 EDT   5m45s

Attempts to connect to the vcluster demonstrate that the vcluster is similarly unresponsive:

root@k8s-ctrl01-nrh:~# vcluster connect csh-vcluster-01 --debug
info   Waiting for vcluster to come up...

What did you expect to happen?

The vcluster come up and become active, as expected via the documentation's getting started guide.

How can we reproduce it (as minimally and precisely as possible)?

install vcluster as outlined in the documentation, then run vcluster create

Anything else we need to know?

This cluster is being virtualized within proxmox. However, all other cluster functions are working as expected.

Host cluster Kubernetes version

```console root@k8s-ctrl01-nrh:~# kubectl version WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version. Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.2", GitCommit:"f66044f4361b9f1f96f0053dd46cb7dce5e990a8", GitTreeState:"clean", BuildDate:"2022-06-15T14:22:29Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"} Kustomize Version: v4.5.4 Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.2", GitCommit:"f66044f4361b9f1f96f0053dd46cb7dce5e990a8", GitTreeState:"clean", BuildDate:"2022-06-15T14:15:38Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"} ```

Host cluster Kubernetes distribution

``` k8s 1.24.2 ```

vlcuster version

```console root@k8s-ctrl01-nrh:~# vcluster --version vcluster version 0.10.2 ```

Vcluster Kubernetes distribution(k3s(default)), k8s, k0s)

``` k8s ```

OS and Arch

``` OS: Debian GNU/Linux 11 (bullseye) Arch: x86_64 ```
vivian-hafener-lanl commented 2 years ago

Additional information: Eventually the pods in the vcluster will all go from Running, with only the api pod on Error, to the following statuses:

vcluster-csh-vcluster-01   csh-vcluster-01-5889f86d59-spc2j             0/1     CrashLoopBackOff   7 (2m17s ago)    15m
vcluster-csh-vcluster-01   csh-vcluster-01-api-7bf54d8477-jzx2c         0/1     CrashLoopBackOff   7 (117s ago)     15m
vcluster-csh-vcluster-01   csh-vcluster-01-controller-bbf6d7b98-9c99d   1/1     Running            6 (3m23s ago)    15m
vcluster-csh-vcluster-01   csh-vcluster-01-etcd-0                       0/1     Pending            0                15m

At this point, the vcluster status also becomes CrashLoopBackOff. Inspecting the logs of the offending api pod reveals the following:

root@k8s-ctrl01-nrh:~# kubectl logs csh-vcluster-01-api-7bf54d8477-jzx2c -n vcluster-csh-vcluster-01
I0711 02:11:35.211595       1 server.go:558] external host was not specified, using 10.244.110.138
I0711 02:11:35.212234       1 server.go:158] Version: v1.24.1
I0711 02:11:35.212311       1 server.go:160] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0711 02:11:35.594587       1 shared_informer.go:255] Waiting for caches to sync for node_authorizer
I0711 02:11:35.595655       1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I0711 02:11:35.595731       1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
I0711 02:11:35.596909       1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I0711 02:11:35.596967       1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
W0711 02:11:55.600288       1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {csh-vcluster-01-etcd:2379 csh-vcluster-01-etcd <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup csh-vcluster-01-etcd: i/o timeout". Reconnecting...
E0711 02:11:55.600384       1 run.go:74] "command failed" err="context deadline exceeded"
vivian-hafener-lanl commented 2 years ago

Additional information: Upon rebooting k8s-ctrl01-nrh, the following occurs:

root@k8s-ctrl01-nrh:~# kubectl get nodes
NAME             STATUS   ROLES           AGE    VERSION
k8s-ctrl01-nrh   Ready    control-plane   111m   v1.24.2
k8s-ctrl02-nrh   Ready    control-plane   82m    v1.24.2
k8s-ctrl03-nrh   Ready    control-plane   81m    v1.24.2
k8s-wrkr01-nrh   Ready    <none>          81m    v1.24.2
k8s-wrkr02-nrh   Ready    <none>          81m    v1.24.2
root@k8s-ctrl01-nrh:~# vcluster list

 NAME              NAMESPACE                  STATUS    CONNECTED   CREATED                         AGE
 csh-vcluster-01   vcluster-csh-vcluster-01   Running               2022-07-10 22:03:33 -0400 EDT   20m45s

root@k8s-ctrl01-nrh:~# vcluster connect csh-vcluster-01
info   Waiting for vcluster to come up...
warn   Pod csh-vcluster-01-5889f86d59-spc2j: Back-off restarting failed container (BackOff)
warn   Pod csh-vcluster-01-5889f86d59-spc2j: Readiness probe failed: Get "https://10.244.205.75:8443/readyz": dial tcp 10.244.205.75:8443: connect: connection refused (Unhealthy)
warn   Pod csh-vcluster-01-5889f86d59-spc2j has critical status: CrashLoopBackOff. vcluster will continue waiting, but this operation might timeout
warn   Pod csh-vcluster-01-5889f86d59-spc2j has critical status: CrashLoopBackOff. vcluster will continue waiting, but this operation might timeout
warn   Pod csh-vcluster-01-5889f86d59-spc2j has critical status: CrashLoopBackOff. vcluster will continue waiting, but this operation might timeout
warn   Pod csh-vcluster-01-5889f86d59-spc2j has critical status: CrashLoopBackOff. vcluster will continue waiting, but this operation might timeout
warn   Pod csh-vcluster-01-5889f86d59-spc2j has critical status: CrashLoopBackOff. vcluster will continue waiting, but this operation might timeout
warn   Pod csh-vcluster-01-5889f86d59-spc2j has critical status: CrashLoopBackOff. vcluster will continue waiting, but this operation might timeout
warn   Pod csh-vcluster-01-5889f86d59-spc2j has critical status: CrashLoopBackOff. vcluster will continue waiting, but this operation might timeout
warn   Pod csh-vcluster-01-5889f86d59-spc2j has critical status: CrashLoopBackOff. vcluster will continue waiting, but this operation might timeout
warn   Pod csh-vcluster-01-5889f86d59-spc2j has critical status: CrashLoopBackOff. vcluster will continue waiting, but this operation might timeout
warn   Pod csh-vcluster-01-5889f86d59-spc2j has critical status: CrashLoopBackOff. vcluster will continue waiting, but this operation might timeout
warn   Pod csh-vcluster-01-5889f86d59-spc2j has critical status: CrashLoopBackOff. vcluster will continue waiting, but this operation might timeout
warn   Pod csh-vcluster-01-5889f86d59-spc2j has critical status: CrashLoopBackOff. vcluster will continue waiting, but this operation might timeout
warn   Pod csh-vcluster-01-5889f86d59-spc2j has critical status: CrashLoopBackOff. vcluster will continue waiting, but this operation might timeout
^C
root@k8s-ctrl01-nrh:~# kubectl logs csh-vcluster-01-5889f86d59-spc2j -n vcluster-csh-vcluster-01
I0711 02:23:49.339946       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp: lookup csh-vcluster-01-api on 10.96.0.10:53: read udp 10.244.205.75:60997->10.96.0.10:53: i/o timeout), will retry in 1 seconds
I0711 02:24:05.345176       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp: lookup csh-vcluster-01-api on 10.96.0.10:53: read udp 10.244.205.75:54758->10.96.0.10:53: i/o timeout), will retry in 1 seconds
I0711 02:24:21.345526       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp: lookup csh-vcluster-01-api on 10.96.0.10:53: read udp 10.244.205.75:45444->10.96.0.10:53: i/o timeout), will retry in 1 seconds
I0711 02:24:22.342641       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp 10.101.210.148:443: connect: connection refused), will retry in 1 seconds
I0711 02:24:23.342600       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp 10.101.210.148:443: connect: connection refused), will retry in 1 seconds
I0711 02:24:24.343437       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp 10.101.210.148:443: connect: connection refused), will retry in 1 seconds
I0711 02:24:25.342360       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp 10.101.210.148:443: connect: connection refused), will retry in 1 seconds
I0711 02:24:26.342367       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp 10.101.210.148:443: connect: connection refused), will retry in 1 seconds
I0711 02:24:27.342381       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp 10.101.210.148:443: connect: connection refused), will retry in 1 seconds
I0711 02:24:28.342298       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp 10.101.210.148:443: connect: connection refused), will retry in 1 seconds
I0711 02:24:29.343023       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp 10.101.210.148:443: connect: connection refused), will retry in 1 seconds
I0711 02:24:30.342233       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp 10.101.210.148:443: connect: connection refused), will retry in 1 seconds
I0711 02:24:31.343191       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp 10.101.210.148:443: connect: connection refused), will retry in 1 seconds
I0711 02:24:32.342010       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp 10.101.210.148:443: connect: connection refused), will retry in 1 seconds
I0711 02:24:33.342624       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp 10.101.210.148:443: connect: connection refused), will retry in 1 seconds
I0711 02:24:34.343065       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp 10.101.210.148:443: connect: connection refused), will retry in 1 seconds
I0711 02:24:35.343207       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp 10.101.210.148:443: connect: connection refused), will retry in 1 seconds
I0711 02:24:36.342206       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp 10.101.210.148:443: connect: connection refused), will retry in 1 seconds
I0711 02:24:37.341927       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp 10.101.210.148:443: connect: connection refused), will retry in 1 seconds
I0711 02:24:38.343007       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp 10.101.210.148:443: connect: connection refused), will retry in 1 seconds
I0711 02:24:39.343066       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp 10.101.210.148:443: connect: connection refused), will retry in 1 seconds
I0711 02:24:40.342948       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp 10.101.210.148:443: connect: connection refused), will retry in 1 seconds
I0711 02:24:41.342167       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp 10.101.210.148:443: connect: connection refused), will retry in 1 seconds
I0711 02:24:42.342463       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp 10.101.210.148:443: connect: connection refused), will retry in 1 seconds
I0711 02:24:43.342436       1 start.go:230] couldn't retrieve virtual cluster version (Get "https://csh-vcluster-01-api:443/version": dial tcp 10.101.210.148:443: connect: connection refused), will retry in 1 seconds
root@k8s-ctrl01-nrh:~# kubectl logs csh-vcluster-01-api-7bf54d8477-jzx2c -n vcluster-csh-vcluster-01
I0711 02:27:46.030643       1 server.go:558] external host was not specified, using 10.244.110.138
I0711 02:27:46.031218       1 server.go:158] Version: v1.24.1
I0711 02:27:46.031264       1 server.go:160] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0711 02:27:46.266071       1 shared_informer.go:255] Waiting for caches to sync for node_authorizer
I0711 02:27:46.267170       1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I0711 02:27:46.267283       1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
I0711 02:27:46.268353       1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I0711 02:27:46.268406       1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
W0711 02:27:46.272634       1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {csh-vcluster-01-etcd:2379 csh-vcluster-01-etcd <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.96.19.252:2379: connect: connection refused". Reconnecting...
W0711 02:27:47.267571       1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {csh-vcluster-01-etcd:2379 csh-vcluster-01-etcd <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.96.19.252:2379: connect: connection refused". Reconnecting...
W0711 02:27:47.274214       1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {csh-vcluster-01-etcd:2379 csh-vcluster-01-etcd <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.96.19.252:2379: connect: connection refused". Reconnecting...
W0711 02:27:48.270313       1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {csh-vcluster-01-etcd:2379 csh-vcluster-01-etcd <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.96.19.252:2379: connect: connection refused". Reconnecting...
W0711 02:27:49.161573       1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {csh-vcluster-01-etcd:2379 csh-vcluster-01-etcd <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.96.19.252:2379: connect: connection refused". Reconnecting...
W0711 02:27:50.089290       1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {csh-vcluster-01-etcd:2379 csh-vcluster-01-etcd <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.96.19.252:2379: connect: connection refused". Reconnecting...
W0711 02:27:51.460051       1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {csh-vcluster-01-etcd:2379 csh-vcluster-01-etcd <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.96.19.252:2379: connect: connection refused". Reconnecting...
W0711 02:27:52.413823       1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {csh-vcluster-01-etcd:2379 csh-vcluster-01-etcd <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.96.19.252:2379: connect: connection refused". Reconnecting...
W0711 02:27:55.777532       1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {csh-vcluster-01-etcd:2379 csh-vcluster-01-etcd <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.96.19.252:2379: connect: connection refused". Reconnecting...
W0711 02:27:56.971876       1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {csh-vcluster-01-etcd:2379 csh-vcluster-01-etcd <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.96.19.252:2379: connect: connection refused". Reconnecting...
W0711 02:28:02.305567       1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {csh-vcluster-01-etcd:2379 csh-vcluster-01-etcd <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.96.19.252:2379: connect: connection refused". Reconnecting...
W0711 02:28:03.154747       1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {csh-vcluster-01-etcd:2379 csh-vcluster-01-etcd <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.96.19.252:2379: connect: connection refused". Reconnecting...
E0711 02:28:06.271160       1 run.go:74] "command failed" err="context deadline exceeded"
root@k8s-ctrl01-nrh:~#
matskiv commented 2 years ago

Hello @viv-codes thank you for reporting the issue. The etcd seems to be the problem here, and all other pods fail because they depend on it. Are you able to provide logs from the csh-vcluster-01-etcd-0 pod and related events, please?

Also, are you using any additional flags or value values with the vcluster create csh-vcluster-01 --debug command? If yes, could you please share those as well? Thank you.

vivian-hafener-lanl commented 2 years ago

Thanks for the help! I do not run vcluster create with additional args. Last night when I was debugging DNS (and determined that it wasn't an issue), I had restarted my vclusters, and now they seem to be having trouble with their initial deployment, and getting the minimum number of replicants:

 NAME              NAMESPACE                  STATUS    CONNECTED   CREATED                         AGE
 csh-vcluster-01   vcluster-csh-vcluster-01   Pending               2022-07-13 08:13:21 -0400 EDT   14m54s

root@k8s-ctrl01-nrh:~# kubectl get pods -n vcluster-csh-vcluster-01
NAME                READY   STATUS    RESTARTS   AGE
csh-vcluster-01-0   0/2     Pending   0          15m
root@k8s-ctrl01-nrh:~#

I ran it again, and I got this log:

debug  Will use namespace vcluster-csh-vcluster-01 to create the vcluster...
info   Waiting until namespace is terminated...
info   Creating namespace vcluster-csh-vcluster-01
info   Create vcluster csh-vcluster-01...
debug  execute command: helm upgrade csh-vcluster-01 https://charts.loft.sh/charts/vcluster-0.10.2.tgz --kubeconfig /tmp/820841665 --namespace vcluster-csh-vcluster-01 --install --repository-config='' --values /tmp/1386798662
done √ Successfully created virtual cluster csh-vcluster-01 in namespace vcluster-csh-vcluster-01
info   Waiting for vcluster to come up...
fatal  timed out waiting for the condition
wait for vcluster
github.com/loft-sh/vcluster/cmd/vclusterctl/cmd.GetKubeConfig
        /Users/runner/work/vcluster/vcluster/cmd/vclusterctl/cmd/util.go:94
github.com/loft-sh/vcluster/cmd/vclusterctl/cmd.(*ConnectCmd).getVClusterKubeConfig
        /Users/runner/work/vcluster/vcluster/cmd/vclusterctl/cmd/connect.go:335
github.com/loft-sh/vcluster/cmd/vclusterctl/cmd.(*ConnectCmd).Connect
        /Users/runner/work/vcluster/vcluster/cmd/vclusterctl/cmd/connect.go:140
github.com/loft-sh/vcluster/cmd/vclusterctl/cmd.(*CreateCmd).Run
        /Users/runner/work/vcluster/vcluster/cmd/vclusterctl/cmd/create.go:186
github.com/loft-sh/vcluster/cmd/vclusterctl/cmd.NewCreateCmd.func1
        /Users/runner/work/vcluster/vcluster/cmd/vclusterctl/cmd/create.go:86
github.com/spf13/cobra.(*Command).execute
        /Users/runner/work/vcluster/vcluster/vendor/github.com/spf13/cobra/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
        /Users/runner/work/vcluster/vcluster/vendor/github.com/spf13/cobra/command.go:974
github.com/spf13/cobra.(*Command).Execute
        /Users/runner/work/vcluster/vcluster/vendor/github.com/spf13/cobra/command.go:902
github.com/loft-sh/vcluster/cmd/vclusterctl/cmd.Execute
        /Users/runner/work/vcluster/vcluster/cmd/vclusterctl/cmd/root.go:41
main.main
        /Users/runner/work/vcluster/vcluster/cmd/vclusterctl/main.go:16
runtime.main
        /Users/runner/hostedtoolcache/go/1.18.3/x64/src/runtime/proc.go:250
runtime.goexit
        /Users/runner/hostedtoolcache/go/1.18.3/x64/src/runtime/asm_amd64.s:1571
failed to parse kube config
github.com/loft-sh/vcluster/cmd/vclusterctl/cmd.(*ConnectCmd).getVClusterKubeConfig
        /Users/runner/work/vcluster/vcluster/cmd/vclusterctl/cmd/connect.go:337
github.com/loft-sh/vcluster/cmd/vclusterctl/cmd.(*ConnectCmd).Connect
        /Users/runner/work/vcluster/vcluster/cmd/vclusterctl/cmd/connect.go:140
github.com/loft-sh/vcluster/cmd/vclusterctl/cmd.(*CreateCmd).Run
        /Users/runner/work/vcluster/vcluster/cmd/vclusterctl/cmd/create.go:186
github.com/loft-sh/vcluster/cmd/vclusterctl/cmd.NewCreateCmd.func1
        /Users/runner/work/vcluster/vcluster/cmd/vclusterctl/cmd/create.go:86
github.com/spf13/cobra.(*Command).execute
        /Users/runner/work/vcluster/vcluster/vendor/github.com/spf13/cobra/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
        /Users/runner/work/vcluster/vcluster/vendor/github.com/spf13/cobra/command.go:974
github.com/spf13/cobra.(*Command).Execute
        /Users/runner/work/vcluster/vcluster/vendor/github.com/spf13/cobra/command.go:902
github.com/loft-sh/vcluster/cmd/vclusterctl/cmd.Execute
        /Users/runner/work/vcluster/vcluster/cmd/vclusterctl/cmd/root.go:41
main.main
        /Users/runner/work/vcluster/vcluster/cmd/vclusterctl/main.go:16
runtime.main
        /Users/runner/hostedtoolcache/go/1.18.3/x64/src/runtime/proc.go:250
runtime.goexit
        /Users/runner/hostedtoolcache/go/1.18.3/x64/src/runtime/asm_amd64.s:1571
matskiv commented 2 years ago

@viv-codes Can you please post events from the namespace (best to capture those like a minute after creation, otherwise they get deleted after some time I think)? and also please attach logs from both containers of your csh-vcluster-01-0 pod. This would be very helpful. Thank you.

vivian-hafener-lanl commented 2 years ago

will do!

It appears that the pod does not exist:

vivi@k8s-ctrl01-nrh:~$ kubectl get pods -n vcluster-csh-vcluster-01
NAME                READY   STATUS    RESTARTS   AGE
csh-vcluster-01-0   0/2     Pending   0          86m
vivi@k8s-ctrl01-nrh:~$ vclust^C
vivi@k8s-ctrl01-nrh:~$ kubectl logs csh-vcluster-01-0
Error from server (NotFound): pods "csh-vcluster-01-0" not found
vivi@k8s-ctrl01-nrh:~$

I'll start a new vcluster and will record the events of the namespace

vivian-hafener-lanl commented 2 years ago

I didn't need to create a new one, I opened the logs for csh-vcluster-01 and this is what I get. I'll look into seeing why this is:

root@k8s-ctrl01-nrh:~# kubectl get events -n vcluster-csh-vcluster-01
LAST SEEN   TYPE     REASON          OBJECT                                         MESSAGE
3m39s       Normal   FailedBinding   persistentvolumeclaim/data-csh-vcluster-01-0   no persistent volumes available for this claim and no storage class is set
root@k8s-ctrl01-nrh:~#
matskiv commented 2 years ago

Ok, so the issue is with the PersistentColumeClaim. You need to manually create a persistent volume that would fulfill the requirements of the PVC, or better yet, have a volume provider installed which will create volumes on demand. But further advice on this depends on where/how you are running your host cluster, and that is out of the scope of vcluster anyway.

Additional info on PVC created by the vcluster: vcluster creates the PVC using this template by default - https://github.com/loft-sh/vcluster/blob/95ca8519d8e713d7c53762473c56137243cdf3f3/charts/k3s/templates/statefulset.yaml#L25-L40 Which is populated by the helm values that you can override. You can see the default values here: https://github.com/loft-sh/vcluster/blob/95ca8519d8e713d7c53762473c56137243cdf3f3/charts/k3s/values.yaml#L148-L157

You can also start a vcluster with no permanent storage if you set this helm value:

storage:
  # If this is disabled, vcluster will use an emptyDir instead of a PersistentVolumeClaim
  persistence: false

but of course, that would mean data loss whenever vcluster pod is deleted or recreated.

vivian-hafener-lanl commented 2 years ago

Ok awesome, thanks. I'm talking with my group about possibly using longhorn for our storage, but since this is a teaching cluster, I mght just use the folder option

vivian-hafener-lanl commented 2 years ago

I'm going to close this issue. Thanks for the help, I really appreciate it