Closed menardorama closed 2 years ago
@menardorama Hi, thanks for your suggests. I'm coding version v0.1.3-alpha now, it will be provide more metrics and add hostpath pvc、nfs pvc monitor. When v0.1.3-alpha released if you have any ideas, please let me know.
I have rewritten scanner.py to manage more Gauges; I have rewritten df parsing for that. Also we have pv provisionned using csi-isilon and mount point include the cluster name as a pattern.
If you can integrate it it would be great
import os
import re
import time
import logging
from prometheus_client import start_http_server, Gauge
g_pvc_total=Gauge('pvc_total','fetching pvc size in bytes',['volumename'])
g_pvc_used=Gauge('pvc_used','fetching pvc usage in bytes',['volumename'])
g_pvc_available=Gauge('pvc_available','fetching pvc available in bytes',['volumename'])
#set metrics
start_http_server(8848)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger('block_pvc_scanner')
logger.setLevel(logging.DEBUG)
print_log = logging.StreamHandler()
print_log.setFormatter(formatter)
logger.addHandler(print_log)
cluster_name = .os.environ['CLUSTER_NAME']
def get_pv_usage(pvc_info):
output=list(filter(None, pvc_info))
pv_total = output[1]
pv_used = output[2]
pv_available = output[3]
logger.info(f'VOLUME: {pv}, ALLOCATED: {output[1]}')
logger.info(f'VOLUME: {pv}, USED: {output[2]}')
logger.info(f'VOLUME: {pv}, AVAILABLE: {output[3]}')
g_pvc_total.labels(pv).set(pv_total)
g_pvc_used.labels(pv).set(pv_used)
g_pvc_available.labels(pv).set(pv_available)
return
while 1:
get_pvc=os.popen("df |grep -E 'kubernetes.io/flexvolume|kubernetes.io/csi|kubernetes.io~csi|kubernetes.io/gce-pd/mounts'")
all_pvcs=get_pvc.readlines()
if len(all_pvcs) == 0:
logger.warning("No block storage pvc found or not supported yet.")
else:
for pvc in all_pvcs:
#get pvc name
pvc_info=pvc.split(' ')
for volume in pvc_info[-1].split('/'):
if re.match("^pvc",volume):
pv=volume
get_pv_usage(pvc_info)
elif re.match("^gke-data",volume):
pv='pvc'+volume.split('pvc')[-1]
get_pv_usage(pvc_info)
elif re.match(cluster_name,volume):
pv=volume.split('pvc')[-1]
get_pv_usage(pvc_info)
elif 'pvc' in volume:
logger.error(f'Canot match this volume: {volume}')
logger.info("Will sleep 15s...")
time.sleep(15)
Hi @kais271 , I'm interesting in your new version that supports Trident and NFS, how can I donwload it for testing ?
Many thanks.
@roldancer Hi, you can find new version on branch v0.1.3-alpha. Or run the following commands to install.
helm repo add pvc-exporter https://kais271.github.io/pvc-exporter/helm3/charts/
kubectl create namespace pvc-exporter
helm install demo pvc-exporter/pvc-exporter --namespace pvc-exporter --version v0.1.3-alpha
@menardorama Hi, I'm still considering whether to introduce standard counters.... In v0.1.3-alpha, I have added some fields to metric pvc_usage. That include pvc usage by MB and pvc requested size, like follow. pvc_usage{container="pvc-exporter", endpoint="metrics", instance="10.3.179.23:8848", job="ok-pvc-exporter", namespace="default", persistentvolume="pvc-32d2741e-2fc5-40fe-b019-dcaccc712ef7", persistentvolumeclaim="local-path-pvc", pod="ok-pvc-exporter-m5vxj", pvc_namespace="default", pvc_requested_size_MB="128.0", pvc_requested_size_human="128M", pvc_type="hostpath", pvc_used_MB="98", service="ok-pvc-exporter"} 0.77
Sounds nice thanks
Do you already have prometheus rules ?
I am trying to find a way to get alerting but promql join queries are hard to write
Sounds nice thanks
Do you already have prometheus rules ?
I am trying to find a way to get alerting but promql join queries are hard to write
@menardorama Try this one
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: pvc-usage
labels:
app: kube-prometheus-stack
release: prome
spec:
groups:
- name: pvc-usage
rules:
- alert: pvc-usage-gt-80
annotations:
summary: "PVC: {{ $labels.persistentvolumeclaim }} usage more than 80%"
description: "pvc usage > 80%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
expr: max by(persistentvolumeclaim,pvc_namespace) (pvc_usage) > 0.8
for: 0m
labels:
severity: warning
Issues close after 30d of inactivity. Reopen the issue with /reopen.
Hi Thanks a lot, I have succesfully deploy version 0.3, I may have something weird.
I don't know if it's due to the "max by" function but once you have a disk full, it will always appear as an alert....
But suprisingly the alert disappear when I restart all the pods of the exporter.
I can try to provide detailed information if you want.
/reopen
Do you mean that the alert will disappear when just restart pvc-exporter ? no restart prometheus or alertmanager?
The alert just disappear when I rester the exporter
Seems that pvc-exporter pod got a cache ?
The exporter itself has no cache, all monitoring data are stored in prometheus. Could you please provide the steps to reproduce ?
Issues close after 30d of inactivity. Reopen the issue with /reopen.
Hi,
On the paper your tool is really great, it cover the scope of monitoring pvc usage.
I suggest that you provide more standard metrics like
Using a
df -h
is not always a good option espacially if we have big volumes (in Tb for example).If you are open I can try to add more metrics
Thanks