Open harry-hathorn opened 1 year ago
It seems that the prometheus-server pod is stuck in CrashLoopBackOff because it can't connect to itself on 127.0.0.1:9090
The errors you see are being produced by config-reloader, a sidecar container, not by prometheus. It is failing to connect to prometheus, prometheus is not running as seen in the output. You should be able to get more info from the prometheus container on the cause of its crashing by specifying the container name (default is prometheus-server
) with the -c
option in the command, e.g.
kubectl logs POD_NAME -c prometheus-server
or for all containers
kubectl logs POD_NAME --all-containers=true
Hello, I have got a very similar error not in AWS but minikube. The prometheus-server container logs are:
ts=2023-08-21T10:28:33.653Z caller=main.go:590 level=info build_context="(go=go1.20.6, platform=linux/amd64, user=root@42454fc0f41e, date=20230725-12:31:24, tags=netgo,builtinassets,stringlabels)"
ts=2023-08-21T10:28:33.653Z caller=main.go:591 level=info host_details="(Linux 6.2.0-26-generic #26~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Jul 13 16:27:29 UTC 2 x86_64 prometheus-1692613669-server-b7f497754-4vpxj (none))"
ts=2023-08-21T10:28:33.653Z caller=main.go:592 level=info fd_limits="(soft=1048576, hard=1048576)"
ts=2023-08-21T10:28:33.653Z caller=main.go:593 level=info vm_limits="(soft=unlimited, hard=unlimited)"
ts=2023-08-21T10:28:33.653Z caller=query_logger.go:93 level=error component=activeQueryTracker msg="Error opening query log file" file=/data/queries.active err="open /data/queries.active: permission denied"
panic: Unable to create mmap-ed active query log
goroutine 1 [running]:
github.com/prometheus/prometheus/promql.NewActiveQueryTracker({0x7ffde774e368, 0x5}, 0x14, {0x3e97c00, 0xc0006dcaa0})
/app/promql/query_logger.go:123 +0x42d
main.main()
/app/cmd/prometheus/main.go:647 +0x74d3
In chart there is a persistentVolume definition:
persistentVolume:
accessModes:
- ReadWriteOnce
annotations: {}
enabled: true
existingClaim: ""
labels: {}
mountPath: /data
size: 8Gi
statefulSetNameOverride: ""
subPath: ""
In Kubernetes the PV and PVC is created well.
kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-72df14b9-b2ef-4a71-b5bd-3623ceb33d77 2Gi RWO Delete Bound monitoring/storage-prometheus-1692620441-alertmanager-0 standard 10m
pvc-8204b163-b495-49f9-8a4b-6f4bd20e38d2 8Gi RWO Delete Bound monitoring/prometheus-1692620441-server standard 10m
kubectl get pvc -n monitoring
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
prometheus-1692620441-server Bound pvc-8204b163-b495-49f9-8a4b-6f4bd20e38d2 8Gi RWO standard 10m
storage-prometheus-1692620441-alertmanager-0 Bound pvc-72df14b9-b2ef-4a71-b5bd-3623ceb33d77 2Gi RWO standard 10m
In the PODs prometheus-server container there is not any PV mounted into /data. Maybe is this a Minikube issue?
@laszlolaszlo I'm getting the exact same issue in EKS, so not just a Minikube issue I think.
I'm seeing the same. Worked in minikube 1 server-cluster, but when trying 3 node cluster it started to fail exactly as described in this issue.
I'm seeing the same. Worked in minikube 1 server-cluster, but when trying 3 node cluster it started to fail exactly as described in this issue.
I have the exact same problem on the exact same setup.
@darvein did you find a solution yet?
i met the same issue at AWS EKS , but my issue is cause by OpenID connect missed , i just add it back then the prometheus become normal.
kubectl logs POD_NAME -c prometheus-server
ts=2024-03-14T06:16:21.228Z caller=main.go:1350 level=error msg="Failed to apply configuration" err="could not get SigV4 credentials: WebIdentityErr: failed to retrieve credentials\ncaused by: InvalidIdentityToken: No OpenIDConnect provider found in your account for https://oidc.eks.xxxxxxx\n\tstatus code: 400, request id: a8260318-e9f7-4fff-b428-6e3da1428276"
The issue is resolved when I removed the user permissions and ran it with root user:
runAsUser: 0 runAsNonRoot: false runAsGroup: 0
@harry-hathorn Is the issue resolved ? I m facing the same problem. Can someone help ?
Describe the bug a clear and concise description of what the bug is.
Prometheus service failing in CrashLoopBackOff with the logs of level=error ts=2023-08-04T18:22:42.675721017Z caller=runutil.go:100 msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post \"http://127.0.0.1:9090/-/reload\": dial tcp 127.0.0.1:9090: connect: connection refused"
All other parts of
What's your helm version?
v3.11.2
What's your kubectl version?
v4.5.7
Which chart?
prometheus-community/prometheus
What's the chart version?
appVersion: v2.46.0 apiVersion: v2 kubeVersion: '>=1.16.0-0'
What happened?
I followed the steps here https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-onboard-ingest-metrics-new-Prometheus.html
In a nutshell, I installed the Prometheus community helm chart on my EKS cluster in AWS https://github.com/prometheus-community/helm-charts.
My Kubernetes version is 1.27
The cluster node group has two t3.large nodes (8gb memory, 2vCPU)
After creation, I have:
Pods:
Pod logs:
Deployments:
Describe failed deployment:
The EBS volumes have successfully bound.
It seems that the prometheus-server pod is stuck in CrashLoopBackOff because it can't connect to itself on 127.0.0.1:9090
I have been searching for an answer all day and trying different things but can't solve the issue.
My helm values looks like this:
What you expected to happen?
I expect the prometheus-server pod to run
How to reproduce it?
Add helm chart repo: helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo add kube-state-metrics https://kubernetes.github.io/kube-state-metrics helm repo update
create namespace kubectl create namespace prometheus-namespace
For amazon EKS set up service roles Set up service roles for the ingestion of metrics from Amazon EKS clusters
Create your values yaml file serviceAccounts: server: name: amp-iamproxy-ingest-service-account annotations: eks.amazonaws.com/role-arn: ${IAM_PROXY_PROMETHEUS_ROLE_ARN} server: remoteWrite:
Run the command: helm install prometheus-chart-name prometheus-community/prometheus -n prometheus-namespace \ -f my_prometheus_values_yaml
Run the command kubectl get deployments -n prometheus and find the failing prometheus service pod
Enter the changed values of values.yaml?
serviceAccounts: server: name: amp-iamproxy-ingest-service-account annotations: eks.amazonaws.com/role-arn: ${IAM_PROXY_PROMETHEUS_ROLE_ARN} server: remoteWrite:
Enter the command that you execute and failing/misfunctioning.
helm install prometheus-chart-name prometheus-community/prometheus -n prometheus-namespace \ -f my_prometheus_values_yaml
Anything else we need to know?
No response