Open traghave123 opened 3 years ago
I've tried to create the below artifacts manually,
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
generation: 1
labels:
app: autoruler
name: autoruler
namespace: node-autolabeler
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: autoruler
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: autoruler
spec:
containers:
- image: quay.io/karmab/autosigner:latest
imagePullPolicy: Always
name: autosigner
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
- image: quay.io/karmab/autolabeller:latest
imagePullPolicy: Always
name: autolabeller
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Exists
status: {}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: autoruler-sa-role
rules:
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- list
- watch
- apiGroups:
- certificates.k8s.io
resources:
- '*'
verbs:
- '*'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: autoruler-sa-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: autoruler-sa-role
subjects:
- kind: ServiceAccount
name: default
namespace: node-autolabeler
This time it got approved automatically.
But we think these artifacts should be automatically created. Could you please let us know what we are missing in order to have these artifacts automatically in control plane cluster?
They should be created automatically, because they are part of the day 2 configuration. Can you share the repository and configuration that you are using to configure your clusters? Thanks.
Hi @yrobla
We found that there is issue in rhacm manifests generated, after fixing it the node-labeller pods are running. Below is the repo/path we are using for creating rhacm policy.
https://github.com/traghave123/test-ran-manifests/tree/master/rhacm-manifests
However we are facing issue with csr auto approval randomly with below errors, could you please help
Incorrect group in csr csr-hgl9p. Ignoring
Signing server cert csr-zprlf
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/autosigner.py", line 96, in watch_csrs
certs_api.replace_certificate_signing_request_approval(csr_name, body)
File "/usr/lib/python3.7/site-packages/kubernetes/client/api/certificates_v1beta1_api.py", line 1439, in replace_certificate_signing_request_approval
return self.replace_certificate_signing_request_approval_with_http_info(name, body, **kwargs) # noqa: E501
File "/usr/lib/python3.7/site-packages/kubernetes/client/api/certificates_v1beta1_api.py", line 1548, in replace_certificate_signing_request_approval_with_http_info
collection_formats=collection_formats)
File "/usr/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 353, in call_api
_preload_content, _request_timeout, _host)
File "/usr/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 184, in __call_api
_request_timeout=_request_timeout)
File "/usr/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 405, in request
body=body)
File "/usr/lib/python3.7/site-packages/kubernetes/client/rest.py", line 290, in PUT
body=body)
File "/usr/lib/python3.7/site-packages/kubernetes/client/rest.py", line 233, in request
raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (409)
Reason: Conflict
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'a1223e40-645e-4310-82e3-2623a948f2bb', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Warning': '299 - "certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest"', 'X-Kubernetes-Pf-Flowschema-Uid': '8045ad41-3b0e-4264-86e3-1f03a8185467', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'b40c123a-2f0e-4194-856d-ae119ea2d75b', 'Date': 'Thu, 15 Jul 2021 03:18:40 GMT', 'Content-Length': '396'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Operation cannot be fulfilled on certificatesigningrequests.certificates.k8s.io \"csr-zprlf\": the object has been modified; please apply your changes to the latest version and try again","reason":"Conflict","details":{"name":"csr-zprlf","group":"certificates.k8s.io","kind":"certificatesigningrequests"},"code":409}
Also please find the attached file with full logs. ErrorDuringCSRApproval.txt
Is this problem still happening? After getting some feedback, it seems that this problem is a temporary one but should disappear after autoapprover retries.
@yrobla Yeah, this happens randomly.
When a worker node is added the csr is not getting approved, when this situation occurs, we need to restart the autolabeller pod using below command.
oc delete pod autoruler-68f74b547c-gdwgh -n node-autolabeler
and the auto approval works.
Kindly help how can we avoid restart of pods, as you can see there are some errors in the logs of the pod.
A fix has been pushed to the autolabeler image. Please can you redeploy, ensuring that you have the latest images, and see if the problem is fixed? Thanks.
Steps followed:
leo
We observed that csr is not getting auto approved
While debugging we found that the pods are not created
Which we think are responsible for auto approving csr.
Could you please help here?