Closed sergeyshaykhullin closed 4 years ago
Are you able to determine the root cause?
@jpkrohling I tried to curl metrics service, but it is not respond, i checked selector and endpoints, its fine. Also in pod definition ports are defined, but metrics is not collecting
@sergeyshaykhullin are you able to get the YAMLs again, properly formatted? It's very hard to understand them with the current formatting. I'm wondering why it shows the target port as 16686 for the Operator Metrics. From what I remember, we create a service monitor only for the ports from the operator itself (not for the operands), and the ports to get the metrics from should be 8383/8686.
@jpkrohling Sorry, i fixed yaml formatting
Far better, thanks! I did some further cleaning, to remove the managed fields. I'll add this to my queue, but I need a few days to try it out. If you do have an idea on what's going on and what the fix might be, let me know, as it would help expedite a solution ;-)
I've found, that ServiceMonitor is pointing not to Jaeger, but to Jaeger-operator, because in Jaeger no metrics ports, but in jaeger operator is:
apiVersion: v1
kind: Pod
metadata:
annotations:
cni.projectcalico.org/podIP: 10.244.3.71/32
cni.projectcalico.org/podIPs: 10.244.3.71/32
creationTimestamp: "2020-07-05T16:34:23Z"
generateName: jaeger-jaeger-operator-6d797c86f-
labels:
app.kubernetes.io/name: jaeger-operator
pod-template-hash: 6d797c86f
manager: kubelet
operation: Update
time: "2020-07-05T16:35:10Z"
name: jaeger-jaeger-operator-6d797c86f-tn5hs
namespace: jaeger
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: jaeger-jaeger-operator-6d797c86f
uid: 8fcaceb9-c4e1-43f3-abaf-405150143522
resourceVersion: "2715"
selfLink: /api/v1/namespaces/jaeger/pods/jaeger-jaeger-operator-6d797c86f-tn5hs
uid: 22d60c2e-df1d-47bc-86dc-2da0bfc965b5
spec:
containers:
- args:
- start
env:
- name: WATCH_NAMESPACE
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: OPERATOR_NAME
value: jaeger-jaeger-operator
image: jaegertracing/jaeger-operator:master
imagePullPolicy: Always
name: jaeger-jaeger-operator
ports:
- containerPort: 8383
name: metrics
protocol: TCP
resources:
limits:
cpu: 200m
memory: 200M
requests:
cpu: 100m
memory: 100M
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: jaeger-jaeger-operator-token-54kqc
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: node4
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: jaeger-jaeger-operator
serviceAccountName: jaeger-jaeger-operator
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: jaeger-jaeger-operator-token-54kqc
secret:
defaultMode: 420
secretName: jaeger-jaeger-operator-token-54kqc
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2020-07-05T16:34:23Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2020-07-05T16:35:10Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2020-07-05T16:35:10Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2020-07-05T16:34:23Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://fd468ec8cf2d2dbf54c8255360433a64173df2d58d33e4544766a5f9f8bd4e5a
image: jaegertracing/jaeger-operator:master
imageID: docker-pullable://jaegertracing/jaeger-operator@sha256:10c5ec958adba5013b63fdc0b954a78d4dafc5d9c2fe007daa73811d6f4ba75d
lastState: {}
name: jaeger-jaeger-operator
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2020-07-05T16:35:10Z"
hostIP: 37.46.128.123
phase: Running
podIP: 10.244.3.71
podIPs:
- ip: 10.244.3.71
qosClass: Burstable
startTime: "2020-07-05T16:34:23Z"
and inside jaeger-operator port 8383 exposed, but 8686 is not
There is generated labels mismatch: Service selector:
selector:
name: jaeger-jaeger-operator
But jaeger operator labels is:
labels:
app: jaeger
app.kubernetes.io/component: all-in-one
app.kubernetes.io/instance: jaeger-jaeger-operator-jaeger
app.kubernetes.io/managed-by: jaeger-operator
app.kubernetes.io/name: jaeger-jaeger-operator-jaeger
app.kubernetes.io/part-of: jaeger
pod-template-hash: 59d748f87f
I used this helm chart: https://github.com/jaegertracing/helm-charts/tree/master/charts/jaeger-operator, but this is service monitor template and labels are ok! https://github.com/jaegertracing/helm-charts/blob/master/charts/jaeger-operator/templates/service.yaml Does jaeger operator overrides service monitor, created by helm?
Could you please check what are the labels in the service monitor right after Helm provisions it? If it contains the label app.kubernetes.io/managed-by: jaeger-operator
, then the Jaeger Operator will attempt to manage it. Otherwise, the Jaeger Operator should keep its hand off of this service.
@jpkrohling This is strange, no required labels exists. But i found manager
field
That's interesting, I would expect this code to only create a service monitor if none exists, not to update an existing one:
@jpkrohling Any updates?
Not yet, sorry. I'll try to get a couple of hours this week to try to reproduce/fix this one.
;c
It's currently on my queue, I should be able to look into it during the next couple weeks.
I couldn't reproduce your situation, but I did find a couple of road bumps and a small bug, but doesn't seem related to your report. Your report seem to be a duplicate of #1067, which also has the Helm chart in the mix.
Basically, let's differentiate between the three possible targets:
It looks like that the Helm charts are able to create the service monitor objects for the Jaeger instances, but that's not relevant to instances created via the operator. The Helm charts don't currently have a service monitor for the Jaeger Operator, neither should they, as the service monitor is provisioned automatically by the Jaeger Operator.
That said, here's how to test it:
make deploy-prometheus-operator
in the linked PR)In the linked PR, you should see the two targets as active. In the latest release, you should see only one active (8383, which is the one that actually has metrics).
I realize that you probably care more about the Jaeger instance metrics. You can refer to an article I wrote some time ago for a more complete scenario, but here's a list of steps to achieve a simple scenario (snippets below):
admin-http
port (14269) for the Jaeger instance (operand) Snippets 1 - Jaeger Operator:
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: default
---
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus-for-jaeger-operator
namespace: default
spec:
serviceAccountName: prometheus
serviceMonitorSelector:
matchLabels:
name: jaeger-operator
Results in:
Snippets 2 - Jaeger instance (operand):
---
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: simplest
spec:
labels:
name: jaeger
---
apiVersion: v1
kind: Service
metadata:
labels:
name: jaeger
name: simplest-admin
namespace: default
spec:
ports:
- name: admin-port
port: 14269
protocol: TCP
selector:
name: jaeger
type: ClusterIP
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
name: jaeger
name: jaeger-metrics
namespace: default
spec:
endpoints:
- port: admin-port
selector:
matchLabels:
name: jaeger
---
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus-for-jaeger
namespace: default
spec:
serviceAccountName: prometheus
serviceMonitorSelector:
matchLabels:
name: jaeger
Results in:
In the end, I think the lesson is that the Jaeger Operator should be creating service monitors for the operands by default (created #1156).
@sergeyshaykhullin I'm closing this as I don't think it's a bug in the operator, but let me know if there's any clarification needed. I opened an issue with the Helm Charts repo.
I installed jaeger using helm-chart operator+jaeger using crds
ServiceMonitor:
Jaeger metrics service:
Jaeger instance created by operator
But prometheus dropped metrics: