Deploying the default api-server fails to start due to etcd pod crashing

raushan2016 commented 5 years ago

etcd container is failing with the error pod has unbound immediate PersistentVolumeClaims (repeated 3 times)

Using the latest version v1.12.alpha.4

Here is the part of the generated yaml file :-

apiVersion: v1
kind: Service
metadata:
  name: raushankapiserver
  namespace: default
  labels:
    api: raushankapiserver
    apiserver: "true"
spec:
  ports:
  - port: 443
    protocol: TCP
    targetPort: 443
  selector:
    api: raushankapiserver
    apiserver: "true"
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: raushankapiserver
  namespace: default
  labels:
    api: raushankapiserver
    apiserver: "true"
spec:
  replicas: 1
  template:
    metadata:
      labels:
        api: raushankapiserver
        apiserver: "true"
    spec:
      containers:
      - name: apiserver
        image: raushan2016/apiserver:latest
        volumeMounts:
        - name: apiserver-certs
          mountPath: /apiserver.local.config/certificates
          readOnly: true
        command:
        - "./apiserver"
        args:
        - "--etcd-servers=http://etcd-svc:2379"
        - "--tls-cert-file=/apiserver.local.config/certificates/tls.crt"
        - "--tls-private-key-file=/apiserver.local.config/certificates/tls.key"
        - "--audit-log-path=-"
        - "--audit-log-maxage=0"
        - "--audit-log-maxbackup=0"
        resources:
          requests:
            cpu: 100m
            memory: 20Mi
          limits:
            cpu: 100m
            memory: 30Mi
      - name: controller
        image: raushan2016/apiserver:latest
        command:
        - "./controller-manager"
        args:
        resources:
          requests:
            cpu: 100m
            memory: 20Mi
          limits:
            cpu: 100m
            memory: 30Mi
      volumes:
      - name: apiserver-certs
        secret:
          secretName: raushankapiserver
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: etcd
  namespace: default
spec:
  serviceName: "etcd"
  replicas: 1
  template:
    metadata:
      labels:
        app: etcd
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: etcd
        image: quay.io/coreos/etcd:latest
        imagePullPolicy: Always
        resources:
          requests:
            cpu: 100m
            memory: 20Mi
          limits:
            cpu: 100m
            memory: 30Mi
        env:
        - name: ETCD_DATA_DIR
          value: /etcd-data-dir
        command:
        - /usr/local/bin/etcd
        - --listen-client-urls
        - http://0.0.0.0:2379
        - --advertise-client-urls
        - http://localhost:2379
        ports:
        - containerPort: 2379
        volumeMounts:
        - name: etcd-data-dir
          mountPath: /etcd-data-dir
        readinessProbe:
          httpGet:
            port: 2379
            path: /health
          failureThreshold: 1
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 2
        livenessProbe:
          httpGet:
            port: 2379
            path: /health
          failureThreshold: 3
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 2
  volumeClaimTemplates:
  - metadata:
     name: etcd-data-dir
     annotations:
        volume.beta.kubernetes.io/storage-class: standard
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
         storage: 10Gi
---
apiVersion: v1
kind: Service
metadata:
  name: etcd-svc
  namespace: default
  labels:
    app: etcd
spec:
  ports:
  - port: 2379
    name: etcd
    targetPort: 2379
  selector:
    app: etcd

raushan2016 commented 5 years ago

kubectl describe pvc
Name: etcd-data-dir-etcd-0 Namespace: default StorageClass: standard Status: Pending Volume:
Labels: app=etcd Annotations: volume.beta.kubernetes.io/storage-class: standard Finalizers: [kubernetes.io/pvc-protection] Capacity:
Access Modes:
VolumeMode: Filesystem Events: Type Reason Age From Message

Warning ProvisioningFailed 6m34s (x52 over 106m) persistentvolume-controller storageclass.storage.k8s.io "standard" not found Mounted By: etcd-0

raushan2016 commented 5 years ago

I am using Azure Kubernetes Cluster here.

kubectl get storageclasses --all-namespaces

NAME PROVISIONER AGE default (default) kubernetes.io/azure-disk 25h managed-premium kubernetes.io/azure-disk 25h

raushan2016 commented 5 years ago

Finally got it working by adding storageClassName: "managed-premium" instead of having annotation Annotations: volume.beta.kubernetes.io/storage-class: standard in the volumeClaimTemplates: for the etcd deployment.

We need some kind of documentation around this.

yue9944882 commented 5 years ago

i think this issue is platform-specific. the "managed-premium" storage-class is provided/injected by Azure as vendor settings. it's probably out of scope of this project, server-sdk/code-generator for developing your aggregated apiservers..

raushan2016 commented 5 years ago

Agreed. But if we document this somewhere, it would be helpful for people who are trying out on different platform. Just like other project like knative..

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 4 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 4 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 4 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/apiserver-builder-alpha/issues/404#issuecomment-578432792): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes-sigs / apiserver-builder-alpha

Deploying the default api-server fails to start due to etcd pod crashing #404