vcluster not working - chmod kine.sock: no such file or directory

kbc8894 commented 3 years ago

I try to create vcluster with vcluster cmd. It is not working with below message. I can't solve this problem.

time="2021-05-15T13:48:57.625861730Z" level=fatal msg="starting kubernetes: preparing server: creating storage endpoint: chmod kine.sock: no such file or directory"

gabeduke commented 3 years ago

I believe kine is the SQL shim for etcd. K3s uses it to preserve the cluster state. Are you using k3s?

kbc8894 commented 3 years ago

K3s uses it to preserve the cluster state.

@gabeduke thanks for reply. No host cluster is k8s(1.19.3) using kubeadm. which information do you need?

FabianKramm commented 3 years ago

@kbc8894 thanks for creating this issue! I assume the log message you posted is from the virtual cluster container in the vcluster statefulset correct? What type of filesystem are you using for persistent volumes and your nodes? This issue (https://github.com/k3s-io/k3s/issues/3137) describes the same error for k3s and it seems to be filesystem related

kbc8894 commented 3 years ago

@FabianKramm

I assume the log message you posted is from the virtual cluster container in the vcluster statefulset correct?

yes, right!

What type of filesystem are you using for persistent volumes and your nodes?

NFS is used. nfs client provisoner

FabianKramm commented 3 years ago

@kbc8894 thanks for the info! Would be interesting to know if this error also occurs if you use an emptyDir volume instead of a persistent volume with NFS.

jnbhavya commented 2 years ago

@FabianKramm I tried to create with emptyDir its working fine but when i try to create PV with NFS its shows no resource found.

kind: PersistentVolume
metadata:
  name: nfs
spec:
  capacity:
    storage: 1000Gi
  accessModes:
    - ReadWriteMany
  nfs:
    server: SERVER_IP
    path: /src/nfs
  mountOptions:
    - vers=4

we are using vcluster version 0.3.3. the same YAML is working on the host k8s cluster.

FabianKramm commented 2 years ago

@jnbhavya sorry for the late response, are you trying to create this PV inside the vcluster? Currently vcluster does not support that, but we are working on that (see #102)

jacksgt commented 2 years ago

Hello,

I'm seeing the exact same issue (chmod kine.sock: no such file or directory) on an Openshift cluster with vcluster version 0.8.1. I installed the The PVC is using CephFS based storage (not NFS like in the previous comments).

$ cat values.yaml
# https://www.vcluster.com/docs/operator/restricted-hosts
openshift:
  enabled: true
# https://www.vcluster.com/docs/operator/external-access#ingress
syncer:
  extraArgs:
  - --tls-san=example.com

$ vcluster create --debug test -f values.yaml 
[info]   execute command: helm upgrade test vcluster --repo https://charts.loft.sh --version 0.8.1 --kubeconfig /tmp/4255751651 --namespace goproxy --install --repository-config='' --values /tmp/1729210185 --values values.yaml
[done] √ Successfully created virtual cluster test in namespace goproxy. 
- Use 'vcluster connect test --namespace goproxy' to access the virtual cluster
- Use `vcluster connect test --namespace goproxy -- kubectl get ns` to run a command directly within the vcluster

$ oc get pvc -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: cephfs.manila.csi.openstack.org
  creationTimestamp: "2022-06-03T06:37:50Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app: vcluster
    release: test
  name: data-test-0
  namespace: goproxy
  resourceVersion: "697495484"
  uid: c06e3ca2-4701-4ab0-bd11-d666cbe2b571
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: cephfs
  volumeMode: Filesystem
  volumeName: pvc-c06e3ca2-4701-4ab0-bd11-d666cbe2b571
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 5Gi
  phase: Bound

$ oc get pods -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
              "10.76.82.211"
          ],
          "default": true,
          "dns": {}
      }]
    k8s.v1.cni.cncf.io/networks-status: |-
      [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
              "10.76.82.211"
          ],
          "default": true,
          "dns": {}
      }]
    kubernetes.io/limit-ranger: 'LimitRanger plugin set: cpu limit for container vcluster;
      cpu limit for container syncer'
    openshift.io/scc: restricted
  creationTimestamp: "2022-06-03T06:37:50Z"
  generateName: test-
  labels:
    app: vcluster
    controller-revision-hash: test-77bf5587f8
    release: test
    statefulset.kubernetes.io/pod-name: test-0
  name: test-0
  namespace: goproxy
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: test
    uid: ee730ca9-249d-4e9a-8f43-00f1b0f5b16a
  resourceVersion: "697499216"
  uid: 75ac6b7b-d07f-4481-8e9e-30e308cd331f
spec:
  affinity: {}
  containers:
  - args:
    - -c
    - /bin/k3s server --write-kubeconfig=/data/k3s-config/kube-config.yaml --data-dir=/data
      --disable=traefik,servicelb,metrics-server,local-storage,coredns --disable-network-policy
      --disable-agent --disable-cloud-controller --flannel-backend=none --disable-scheduler
      --kube-controller-manager-arg=controllers=*,-nodeipam,-nodelifecycle,-persistentvolume-binder,-attachdetach,-persistentvolume-expander,-cloud-node-lifecycle
      --kube-apiserver-arg=endpoint-reconciler-type=none --service-cidr=172.30.0.0/16
      && true
    command:
    - /bin/sh
    image: rancher/k3s:v1.22.8-k3s1
    imagePullPolicy: IfNotPresent
    name: vcluster
    resources:
      limits:
        cpu: "1"
        memory: 2Gi
      requests:
        cpu: 200m
        memory: 256Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - KILL
        - MKNOD
        - SETGID
        - SETUID
      runAsUser: 1010310000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /data
      name: data
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-g9ccs
      readOnly: true
  - args:
    - --name=test
    - --service-account=vc-workload-test
    - --tls-san=vcluster-cubieserver.app.cern.ch
    image: loftsh/vcluster:0.8.1
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 60
      httpGet:
        path: /healthz
        port: 8443
        scheme: HTTPS
      initialDelaySeconds: 60
      periodSeconds: 2
      successThreshold: 1
    name: syncer
    readinessProbe:
      failureThreshold: 60
      httpGet:
        path: /readyz
        port: 8443
        scheme: HTTPS
      periodSeconds: 2
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: 100m
        memory: 128Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - KILL
        - MKNOD
        - SETGID
        - SETUID
      runAsUser: 1010310000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /manifests/coredns
      name: coredns
      readOnly: true
    - mountPath: /data
      name: data
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-g9ccs
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: test-0
  imagePullSecrets:
  - name: vc-test-dockercfg-9z5jc
  nodeName: standard-node-xxx
  nodeSelector:
    node-role.kubernetes.io/standard: ""
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1010310000
    seLinuxOptions:
    seLinuxOptions:
      level: s0:c102,c4
  serviceAccount: vc-test
  serviceAccountName: vc-test
  subdomain: test-headless
  terminationGracePeriodSeconds: 10
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: data-test-0
  - configMap:
      defaultMode: 420
      name: test-coredns
    name: coredns
  - name: kube-api-access-g9ccs
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
      - configMap:
          items:
          - key: service-ca.crt
            path: service-ca.crt
          name: openshift-service-ca.crt
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-06-03T06:38:03Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2022-06-03T06:38:03Z"
    message: 'containers with unready status: [vcluster syncer]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2022-06-03T06:38:03Z"
    message: 'containers with unready status: [vcluster syncer]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2022-06-03T06:38:03Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://f76a00fc9ec549845b2016b52c6c030f3af351144dc867bcab027e2017451221
    image: docker.io/loftsh/vcluster:0.8.1
    imageID: docker.io/loftsh/vcluster@sha256:495fc75b50ec1f71a12ed201c15e9621a4073e2db729bc76aa163dab5d70b80e
    lastState: {}
    name: syncer
    ready: false
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2022-06-03T06:38:28Z"
  - containerID: cri-o://37b507238e774b396f770cc231ef5a12c4b31a7cef3ed205a8b06eeadba81b0a
    image: docker.io/rancher/k3s:v1.22.8-k3s1
    imageID: docker.io/rancher/k3s@sha256:5a03ee2ab56f7bb051f7aefe979c432ab42624af0d4d3a9ba739c151c684d4a7
    lastState:
      terminated:
        containerID: cri-o://ce7e099aec8d70362a3fa72d97548c27200c4367dc616ab272739c802d99af06
        exitCode: 1
        finishedAt: "2022-06-03T06:39:16Z"
        reason: Error
        startedAt: "2022-06-03T06:39:16Z"
    name: vcluster
    ready: false
    restartCount: 4
    started: false
    state:
      terminated:
        containerID: cri-o://37b507238e774b396f770cc231ef5a12c4b31a7cef3ed205a8b06eeadba81b0a
        exitCode: 1
        finishedAt: "2022-06-03T06:40:10Z"
        reason: Error
        startedAt: "2022-06-03T06:40:09Z"
  hostIP: 1.1.1.1
  phase: Running
  podIP: 10.76.82.211
  podIPs:
  - ip: 10.76.82.211
  qosClass: Burstable
  startTime: "2022-06-03T06:38:03Z"
$ oc logs test-0 -c vcluster
time="2022-06-03T06:41:31Z" level=info msg="Starting k3s v1.22.8+k3s1 (21fed356)"
time="2022-06-03T06:41:31Z" level=info msg="Configuring sqlite3 database connection pooling: maxIdleConns=2, maxOpenConns=0, connMaxLifetime=0s"
time="2022-06-03T06:41:31Z" level=info msg="Configuring database table schema and indexes, this may take a moment..."
time="2022-06-03T06:41:31Z" level=info msg="Database tables and indexes are up to date"
time="2022-06-03T06:41:31Z" level=fatal msg="starting kubernetes: preparing server: creating storage endpoint: creating listener: chmod kine.sock: no such file or directory"

$ oc logs test-0 -c syncer
I0603 06:44:29.219844       1 start.go:166] couldn't find virtual cluster kube-config, will retry in 1 seconds
I0603 06:44:30.219876       1 start.go:166] couldn't find virtual cluster kube-config, will retry in 1 seconds
I0603 06:44:31.219801       1 start.go:166] couldn't find virtual cluster kube-config, will retry in 1 seconds
I0603 06:44:32.219963       1 start.go:166] couldn't find virtual cluster kube-config, will retry in 1 seconds
...

Please let me know any other commands which I can run to debug this issue.

matskiv commented 1 year ago

@jacksgt thank you for reporting that you are also experiencing this problem and providing detailed information. And apologies for not noticing your comment sooner. We will triage this issue soon. If you resolved the problem in the meantime - please let us know :)

matskiv commented 1 year ago

@carlmontanari Have you had any issues with vcluster in your lab setup with NFS?

carlmontanari commented 1 year ago

Yeah, with k3s things will not start, or will start and time out. I've been just using k0s or k8s with no issues though. It's been a while since I looked at that but I think I put all the info I know in #646 so hopefully that is helpful! Otherwise, testing with k0s/k8s distros should be able to confirm that it is the same/similar issue (w/ nfs).

matskiv commented 1 year ago

@carlmontanari Thanks!

So since the problem is still present, we will keep this issue open in the backlog. For now, the workaround is to try a different distro.

felipecrs commented 9 months ago

maybe https://github.com/loft-sh/vcluster/pull/1320/ can be a fix this issue?

joaocc commented 4 months ago

@felipecrs we have some issues running etcd on EFS (https://github.com/loft-sh/vcluster/issues/1342), but in eks-d. We went to this approach as any of the base distros (like k3s or k0s) use sqlite storage, which will not work on NFS (need to dig reference up).

In our case we are able to use vcluster on EFS, but have a huge bill in terms of EFS writes (~100$/mth/etcd) as it seems we keep constant writing (details on ticket)

loft-sh / vcluster

vcluster not working - chmod kine.sock: no such file or directory #14