Open vivekj11 opened 5 years ago
cAdvisor is a collection agent, not a storage backend. It sounds like you are using prometheus, which should support historical queries on metrics previously collected.
@dashpole Thank you for the info regarding cadvisor. can you please check my Prometheus config, if I am missing something here?
Prometheus config from my docker-compose file--
prometheus:
image: prom/prometheus:v2.8.1
networks:
- monitor
# ports:
# - '9090:9090'
volumes:
- /promstack/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- /promstack/prom_data:/prometheus:rw
- /promstack/alert.rules:/etc/prometheus/alert.rules:ro
user: "0"
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention=7d'
Here I am keeping data for 7 days but not able to see details for a Pod which got terminated an hour back.
my prometheus.yml looks like this -
global:
scrape_interval: 60s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 60s # Evaluate rules every 15 seconds. The default is every 1 minute.
scrape_timeout: 30s #scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- serverip:9093
timeout: 30s
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "/etc/prometheus/alert.rules"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets:
- localhost:9090
When you say "see details", what are you using to do that? Are you using a UI, or running a query?
I don't see the kubelet's cAdvisor endpoint (/metrics/cadvisor) anywhere in your config either.
Oh, I missed providing a complete file and information,
My infra details-
the complete docker-compose file that I am using is --
version: '3'
services:
prometheus:
image: prom/prometheus:v2.6.1
restart: always
user: "0"
ports:
- "9090:9090"
command:
- '--storage.tsdb.retention=14d'
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
volumes:
- /promstack/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- /promstack/prometheus_data:/prometheus:rw
grafana:
image: grafana/grafana:5.4.3
restart: always
ports:
- "3000:3000"
user: "0"
volumes:
- /promstack/grafana_data:/var/lib/grafana:rw
and my complete prometheus.yml --
global:
scrape_interval: 60s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 60s # Evaluate rules every 15 seconds. The default is every 1 minute.
scrape_timeout: 30s #scrape_timeout is set to the global default (10s).
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets:
- localhost:9090
- job_name: 'server-02'
static_configs:
- targets:
- server-02:10255
metrics_path: '/metrics/cadvisor'
- job_name: 'server-03'
static_configs:
- targets:
- server-03:10255
metrics_path: '/metrics/cadvisor'
That sounds about right... When you look in Grafana for metrics, do you see the workloads you are looking for?
yes, I can see everything on my dashboard.
We are triggering a build in almost every hour for the same service. So now, I can see details related to my new containers in about 5 mins of their start, but the container which was running prior to that, its details no more available. like when I select 3 hours in my Grafana dashboard, I can see only latest(running pod) but terminated pods details no more available.
That's bizarre... It sounds like cAdvisor is correctly delivering data, so it must be an issue with data retention in prometheus, or your grafana query.
@dashpole Thanks for the help..! I got the culprit here, The pods were living for a fraction of seconds and I was using irate for 2 mins. (on my selected metrics) after removing the irate I am able to see old containers details.
However, I want to use irate (since without it a total of metrics over the period of time would be visible.). I am working on that part.
We can close this issue now.
I am working with kubelet cadvisor (that comes along with kubernetes cluster), everything is working fine except the details for old Pods (which got terminated/stopped) are not available at all. I want the details of old Pods for performance analysis.
currently my prometheus.yml target is "myclusterip:10255/metrics/cadvisor"
Need Help. !