Closed JamesMura closed 6 years ago
Thanks for the fix 👍
I cloned after this change was merged and encountered the same issue.
failed to run operator. Error starting agent daemonset: Error starting agent daemonset: cannot detect the pod name. Please provide it using the downward API in the manifest file
I'm new to k8s - Is there a way to recreate this Rook image? I tried to bring up other pods and they all failed PersistentVolumeClaim is not bound
because of this. Should I just recreate those Pods (cockroachdb, wordpress, nginx ingress) or would they automatically connect to a new Rook?
FWIW -- Here is a diff of the example cluster from https://github.com/rook/rook/blob/master/cluster/examples/kubernetes/rook-operator.yaml and the one included in this repo:
--- rook-operator.yaml 2017-10-25 22:12:46.000000000 -0400
+++ rook-operator.example.yaml 2017-11-01 05:29:49.000000000 -0400
@@ -1,3 +1,8 @@
+apiVersion: v1
+kind: Namespace
+metadata:
+ name: rook-system
+---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
@@ -12,6 +17,7 @@
- pods
- services
- nodes
+ - nodes/proxy
- configmaps
- events
- persistentvolumes
@@ -52,6 +58,8 @@
resources:
- clusterroles
- clusterrolebindings
+ - roles
+ - rolebindings
verbs:
- get
- list
@@ -79,13 +87,13 @@
kind: ServiceAccount
metadata:
name: rook-operator
- namespace: default
+ namespace: rook-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: rook-operator
- namespace: default
+ namespace: rook-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
@@ -93,13 +101,13 @@
subjects:
- kind: ServiceAccount
name: rook-operator
- namespace: default
+ namespace: rook-system
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: rook-operator
- namespace: default
+ namespace: rook-system
spec:
replicas: 1
template:
@@ -122,6 +130,18 @@
# current mon with a new mon (useful for compensating flapping network).
- name: ROOK_MON_OUT_TIMEOUT
value: "300s"
+ - name: NODE_NAME
+ valueFrom:
+ fieldRef:
+ fieldPath: spec.nodeName
+ - name: ROOK_OPERATOR_SERVICE_ACCOUNT
+ valueFrom:
+ fieldRef:
+ fieldPath: spec.serviceAccountName
+ - name: POD_NAME
+ valueFrom:
+ fieldRef:
+ fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
From that diff, I removed the rook-system changes and the front-matter which defined rook-system (the top 5 lines), and called this manifests/rook-operator.example.yaml.
I then ran:
kubectl replace -f rook-operator.example.yaml
And now the rook-operator is running. I have Persistent Volumes. Some of the other pods I created after the initial install have recovered. I now have failing rook-agent pods - one for each node.
Error: failed to start container "rook-agent": Error response from daemon:
{"message":"mkdir /usr/libexec/kubernetes: read-only file system"}
Error syncing pod
Back-off restarting failed container
Sounds like https://github.com/rook/rook/issues/1120
Hi @displague , thanks so much for investigating this issue! It sounds like this is indeed the issue. Either the kubernetes or rook API has changed abit such that it now requires some fixing. Unfortunately, I'm swamped with school now so I'm afraid I won't be able to look deeper into fixing this until mid December. If you wish, feel free to submit a PR until then :)
My hunch is that some flag needs to be added to the kubelet.service systemd file (which can be found at https://github.com/kahkhang/kube-linode/blob/master/manifests/container-linux-config.yaml and https://github.com/kahkhang/kube-linode/blob/master/manifests/container-linux-config-worker.yaml) to include --volume-plugin-dir=/etc/kubernetes/volumeplugins
, and also probably the kubernetes version needs some bumping up as well to the latest one which supports flex volume plugins.
Thanks so much once again for highlighting the issue!
Adds the POD_NAMESPACE variable to the rook operator manifest fixes #50