Open maheshd2 opened 3 years ago
Have you checked if the auto-labeler pods are running in your cluster, and can you check the logs there? Also, if you do oc get csr -A, do you see any pending certificate requests?
@yrobla It seems some environmental issue, We did a fresh deployment and found node-autolabeler pods running. And all the 3 use cases which are mentioned above are working fine. We can close this issue.
@yrobla FYI,
We faced csr approval issue again, but after restarting node-labeller pod...it worked.
This is the error we observed in the node labeller pod.
[root@vcuhost tmp]# oc get csr
NAME AGE SIGNERNAME REQUESTOR CONDITION
csr-8rdtx 78s kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
[root@vcuhost tmp]# oc logs -f autoruler-68f74b547c-gdwgh
Error from server (NotFound): pods "autoruler-68f74b547c-gdwgh" not found
[root@vcuhost tmp]# oc logs -f autoruler-68f74b547c-gdwgh -n node-autolabeler
error: a container name must be specified for pod autoruler-68f74b547c-gdwgh, choose one of: [autosigner autolabeller]
[root@vcuhost tmp]# oc logs -f autoruler-68f74b547c-gdwgh -n node-autolabeler -c autolabeler
error: container autolabeler is not valid for pod autoruler-68f74b547c-gdwgh
[root@vcuhost tmp]# oc logs -f autoruler-68f74b547c-gdwgh -n node-autolabeler -c autosigner
Missing configmap autorules in namespace node-autolabeler
No rules defined, using dummy worker one
Handling name rule .*worker.*
No specific allowed_networks defined. No check on ips will be made
Starting main signing loop...
Signing client cert csr-wwvt2
Signing server cert csr-bjzzl
Signing client cert csr-brxcs
Signing server cert csr-h552d
Signing client cert csr-4zh6n
Signing server cert csr-gsdtn
Signing client cert csr-jvrh9
Signing server cert csr-x7r2v
Signing client cert csr-q6kwz
Signing server cert csr-8tmtf
Signing client cert csr-pvpk5
Signing server cert csr-sfq2s
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Incorrect username in csr csr-gt9rm. Ignoring
Signing server cert csr-6mz2c
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/autosigner.py", line 96, in watch_csrs
certs_api.replace_certificate_signing_request_approval(csr_name, body)
File "/usr/lib/python3.7/site-packages/kubernetes/client/api/certificates_v1beta1_api.py", line 1439, in replace_certificate_signing_request_approval
return self.replace_certificate_signing_request_approval_with_http_info(name, body, **kwargs) # noqa: E501
File "/usr/lib/python3.7/site-packages/kubernetes/client/api/certificates_v1beta1_api.py", line 1548, in replace_certificate_signing_request_approval_with_http_info
collection_formats=collection_formats)
File "/usr/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 353, in call_api
_preload_content, _request_timeout, _host)
File "/usr/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 184, in __call_api
_request_timeout=_request_timeout)
File "/usr/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 405, in request
body=body)
File "/usr/lib/python3.7/site-packages/kubernetes/client/rest.py", line 290, in PUT
body=body)
File "/usr/lib/python3.7/site-packages/kubernetes/client/rest.py", line 233, in request
raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (409)
Reason: Conflict
HTTP response headers: HTTPHeaderDict({'Audit-Id': '94d481fc-a7d1-425a-a123-140239882d79', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Warning': '299 - "certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest"', 'X-Kubernetes-Pf-Flowschema-Uid': 'fcc80339-a9a5-4aba-af4c-b3017f0e1fac', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'e4bf808b-72ff-4a15-b2d2-b7e9fdd001c8', 'Date': 'Thu, 27 May 2021 09:18:10 GMT', 'Content-Length': '396'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Operation cannot be fulfilled on certificatesigningrequests.certificates.k8s.io \"csr-6mz2c\": the object has been modified; please apply your changes to the latest version and try again","reason":"Conflict","details":{"name":"csr-6mz2c","group":"certificates.k8s.io","kind":"certificatesigningrequests"},"code":409}
^C
[root@vcuhost tmp]# oc get csr
NAME AGE SIGNERNAME REQUESTOR CONDITION
csr-8rdtx 3m31s kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
[root@vcuhost tmp]# oc get pods --all-namespaces | grep auto
node-autolabeler autoruler-68f74b547c-gdwgh 2/2 Running 0 6d18h
openshift-machine-api cluster-autoscaler-operator-558bcc56d-kvhhc 2/2 Running 0 6d18h
[root@vcuhost tmp]# oc delete pod autoruler-68f74b547c-gdwgh -n node-autolabeler
pod "autoruler-68f74b547c-gdwgh" deleted
[root@vcuhost tmp]#
[root@vcuhost tmp]# oc get pods --all-namespaces | grep auto
node-autolabeler autoruler-68f74b547c-9g2bd 2/2 Running 0 38s
openshift-machine-api cluster-autoscaler-operator-558bcc56d-kvhhc 2/2 Running 0 6d18h
[root@vcuhost tmp]# oc logs -f autoruler-68f74b547c-gdwgh -n node-autolabeler^Cc autolabeler
[root@vcuhost tmp]# oc get csr
NAME AGE SIGNERNAME REQUESTOR CONDITION
csr-8rdtx 4m54s kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued
csr-qjjb9 37s kubernetes.io/kubelet-serving system:node:sos-worker-2 Approved,Issued
[root@vcuhost tmp]# oc get nodes
NAME STATUS ROLES AGE VERSION
sos-master-1 Ready master,worker 6d20h v1.19.0+e49167a
sos-master-2 Ready master,worker 6d20h v1.19.0+e49167a
sos-master-3 Ready master,worker 6d20h v1.19.0+e49167a
sos-worker-2 NotReady worker 55s v1.19.0+e49167a
[root@vcuhost tmp]# watch oc get nodes
[root@vcuhost tmp]# cd /tmp/
[root@vcuhost tmp]# ls -lrth
@yrobla could you please check above error in the autoruler-68f74b547c-gdwgh
pod.
the issue seems happening more frequently.
@yrobla We are facing issue with certificate approval while adding worker nodes. Please go through below list of use cases which we tried and corresponding issues which we are facing.
Use Case 1: Imported the control plane cluster(spoke cluster) into ACM Hub and then added a worker node - It approved the certificates and subsequently node was listed by 'oc' command.
Use Case 2: Removed the above worker node from the cluster and re-added the same node to the cluster back - But in this case, we see that certs are not getting approved and also worker node is not listed by 'oc' command.
Use Case 3: Added a 2nd worker node the existing cluster, Again it is observed that the certs are not getting approved for the 2nd worker node and even it is not listed by 'oc' command.
As discussed earlier, we came to know that importing the cluster into ACM will take care of approving the certs.
So, Please let us know how to solve the issues which we are facing now.