Open play-io opened 7 months ago
Can you clarify how often you see the Operator reconciling? Is it reconciling constantly?
I'm looking at a cluster with operator v1.33.0 and see reconciles every 5 minutes with a request name periodic-5m0s-reconcile-event
. I doubt there are update calls that are triggered from that but I haven't specifically checked.
If you are seeing reconciliation consistently I'd wonder if you are using AddonManager or something similar that is managing a resource that the operator watches and if that resource is constantly being updated triggering changes. For example the operator writes some default values into the Installation CR, I'm wondering if perhaps you have something that manages the Installation CR and is removing those default values (updating the CR) which would trigger the operator to reconcile again. Another option that could perhaps trigger reconcile is if you have something that is watching and updating deployments/daemonsets and the operator keeps reconciling those changes away and fighting with whatever is modifying the Calico resources.
I'd suggest sharing a larger snippet of the operator logs if my previous comments don't help identify the issue.
@tmjd Below are graphs for read and write events for namespaces. I am still digging into what the event was where we see the increase.
cc: @diranged @scohen-nd
Reads:
Writes:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "14"
kubectl.kubernetes.io/last-applied-configuration: |
[REDACTED]
policies.kyverno.io/patches: |
require-app-label.set-default-app-labels.kyverno.io: added /spec/template/metadata/labels/app
creationTimestamp: "2021-05-04T19:21:22Z"
generation: 24
labels:
cfn_version: "1.27"
k8s-app: tigera-operator
name: tigera-operator
namespace: tigera-operator
resourceVersion: "13011510649"
uid: 4aaf443d-e297-4f30-9d60-d2a853ded0f4
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
name: tigera-operator
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
kubectl.kubernetes.io/restartedAt: "2022-08-26T14:11:09-07:00"
creationTimestamp: null
labels:
app: tigera-operator
k8s-app: tigera-operator
name: tigera-operator
spec:
containers:
- command:
- operator
env:
- name: WATCH_NAMESPACE
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: OPERATOR_NAME
value: tigera-operator
- name: TIGERA_OPERATOR_INIT_IMAGE_VERSION
value: v3.26.4
envFrom:
- configMapRef:
name: kubernetes-services-endpoint
optional: true
image: [REDACTED].dkr.ecr.us-west-2.amazonaws.com/quay-io/tigera/operator:v1.30.10
imagePullPolicy: IfNotPresent
name: tigera-operator
resources:
limits:
memory: 1Gi
requests:
cpu: 100m
memory: 384Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/calico
name: var-lib-calico
readOnly: true
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true
nodeSelector:
kubernetes.io/os: linux
[REDACTED].group-name: kube-system
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: tigera-operator
serviceAccountName: tigera-operator
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: [REDACTED].group-name
operator: Equal
value: kube-system
volumes:
- hostPath:
path: /var/lib/calico
type: ""
name: var-lib-calico
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2021-08-16T23:40:35Z"
lastUpdateTime: "2024-07-21T14:39:14Z"
message: ReplicaSet "tigera-operator-78b6d57c44" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: "2024-10-04T16:48:31Z"
lastUpdateTime: "2024-10-04T16:48:31Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 24
readyReplicas: 1
replicas: 1
updatedReplicas: 1
apiVersion: v1
kind: Namespace
metadata:
annotations:
created_by: user
group: system:masters
kubectl.kubernetes.io/last-applied-configuration: |
[REDACTED]
owner: kubernetes-admin
policies.kyverno.io/patches: |
add-ns-owner-annotations.annotate-namespaces-with-owner.kyverno.io: added /metadata/annotations/owner
creationTimestamp: "2021-08-16T23:40:22Z"
labels:
cfn_version: "1.27"
kubernetes.io/metadata.name: calico-system
name: calico-system
pod-security.kubernetes.io/enforce: privileged
pod-security.kubernetes.io/enforce-version: latest
pod-security.kubernetes.io/warn: privileged
name: calico-system
ownerReferences:
- apiVersion: operator.tigera.io/v1
blockOwnerDeletion: true
controller: true
kind: Installation
name: default
uid: 5956e270-2844-4c79-830d-ebcc9258268e
resourceVersion: "9451738645"
uid: b1a4c8e3-b3a2-4ef0-9b12-c99513f45622
spec:
finalizers:
- kubernetes
status:
phase: Active
Upon installing Tigera Operator in my EKS cluster on AWS with kube-apiserver-audit logs enabled, I noticed a significant increase in log volume. These logs are being pushed to CloudWatch, leading to an unexpected increase in billing. The primary concerns are twofold:
The continued operations on Kubernetes resources seem to be extending the kube-apiserver-audit logs, causing them to grow substantially. The ongoing reconciliation performed by the Tigera Operator appears to be generating additional logs, which are also contributing to the increased log volume. I seek clarification on whether this behavior is expected or if it indicates a problem with the Tigera Operator, potentially causing the observed increase in log volume and associated billing.
Expected Behavior
The Tigera Operator should perform operations on Kubernetes resources (such as GET, UPDATE, etc.) as needed and should reconcile (as indicated by "msg":"Reconciling Installation.operator.tigera.io") when necessary.
Current Behavior
The current behavior observed after installing the Tigera Operator involves extensive logging and frequent access to Kubernetes resources. This includes operations such as GET and UPDATE on Kubernetes resources, as well as continuous reconciliations performed by the operator. However, it remains uncertain whether this level of activity is expected behavior or indicative of a potential issue within the installation. Clarification is sought to determine if the observed behavior aligns with the intended functionality of the Tigera Operator or if it signifies an anomaly requiring further investigation.
Possible Solution
n/a
Steps to Reproduce (for bugs)
Context
AWS Cloudwatch is updated with high frequency with messages like:
Operator workload's log is updated with high frequency with messages like:
Your Environment
AWS EKS v1.24