menardorama commented 3 years ago

Hi,

On the paper your tool is really great, it cover the scope of monitoring pvc usage.

I suggest that you provide more standard metrics like

kubelet_volume_stats_available_bytes
kubelet_volume_stats_capacity_bytes
kubelet_volume_stats_inodes
kubelet_volume_stats_inodes_free
kubelet_volume_stats_inodes_used
kubelet_volume_stats_used_bytes

Using a df -h is not always a good option espacially if we have big volumes (in Tb for example).

If you are open I can try to add more metrics

Thanks

kais271 commented 3 years ago

@menardorama Hi, thanks for your suggests. I'm coding version v0.1.3-alpha now, it will be provide more metrics and add hostpath pvc、nfs pvc monitor. When v0.1.3-alpha released if you have any ideas, please let me know.

menardorama commented 3 years ago

I have rewritten scanner.py to manage more Gauges; I have rewritten df parsing for that. Also we have pv provisionned using csi-isilon and mount point include the cluster name as a pattern.

If you can integrate it it would be great

import os
import re
import time
import logging

from prometheus_client import start_http_server, Gauge

g_pvc_total=Gauge('pvc_total','fetching pvc size in bytes',['volumename'])
g_pvc_used=Gauge('pvc_used','fetching pvc usage in bytes',['volumename'])
g_pvc_available=Gauge('pvc_available','fetching pvc available in bytes',['volumename'])
#set metrics
start_http_server(8848)

formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger('block_pvc_scanner')
logger.setLevel(logging.DEBUG)
print_log = logging.StreamHandler()
print_log.setFormatter(formatter)
logger.addHandler(print_log)

cluster_name = .os.environ['CLUSTER_NAME']

def get_pv_usage(pvc_info):
  output=list(filter(None, pvc_info))
  pv_total = output[1]
  pv_used = output[2]
  pv_available = output[3]

  logger.info(f'VOLUME: {pv}, ALLOCATED: {output[1]}')
  logger.info(f'VOLUME: {pv}, USED: {output[2]}')
  logger.info(f'VOLUME: {pv}, AVAILABLE: {output[3]}')
  g_pvc_total.labels(pv).set(pv_total)
  g_pvc_used.labels(pv).set(pv_used)
  g_pvc_available.labels(pv).set(pv_available)
  return

while 1:
  get_pvc=os.popen("df |grep -E 'kubernetes.io/flexvolume|kubernetes.io/csi|kubernetes.io~csi|kubernetes.io/gce-pd/mounts'")
  all_pvcs=get_pvc.readlines()
  if len(all_pvcs) == 0:
    logger.warning("No block storage pvc found or not supported yet.")
  else:
    for pvc in all_pvcs:
      #get pvc name
      pvc_info=pvc.split(' ')
      for volume in pvc_info[-1].split('/'):
        if re.match("^pvc",volume):
          pv=volume
          get_pv_usage(pvc_info)
        elif re.match("^gke-data",volume):
          pv='pvc'+volume.split('pvc')[-1]
          get_pv_usage(pvc_info)
        elif re.match(cluster_name,volume):
          pv=volume.split('pvc')[-1]
          get_pv_usage(pvc_info)
        elif 'pvc' in volume:
          logger.error(f'Canot match this volume: {volume}')
  logger.info("Will sleep 15s...")
  time.sleep(15)

roldancer commented 3 years ago

Hi @kais271 , I'm interesting in your new version that supports Trident and NFS, how can I donwload it for testing ?

Many thanks.

kais271 commented 3 years ago

@roldancer Hi, you can find new version on branch v0.1.3-alpha. Or run the following commands to install.

This will be provide 2 metrics: pvc_usage and pvc_mapping

helm repo add pvc-exporter https://kais271.github.io/pvc-exporter/helm3/charts/
kubectl create namespace pvc-exporter
helm install demo pvc-exporter/pvc-exporter --namespace pvc-exporter --version v0.1.3-alpha

kais271 commented 3 years ago

@menardorama Hi, I'm still considering whether to introduce standard counters.... In v0.1.3-alpha, I have added some fields to metric pvc_usage. That include pvc usage by MB and pvc requested size, like follow. pvc_usage{container="pvc-exporter", endpoint="metrics", instance="10.3.179.23:8848", job="ok-pvc-exporter", namespace="default", persistentvolume="pvc-32d2741e-2fc5-40fe-b019-dcaccc712ef7", persistentvolumeclaim="local-path-pvc", pod="ok-pvc-exporter-m5vxj", pvc_namespace="default", pvc_requested_size_MB="128.0", pvc_requested_size_human="128M", pvc_type="hostpath", pvc_used_MB="98", service="ok-pvc-exporter"} 0.77

menardorama commented 3 years ago

Sounds nice thanks

Do you already have prometheus rules ?

I am trying to find a way to get alerting but promql join queries are hard to write

kais271 commented 3 years ago

Sounds nice thanks

Do you already have prometheus rules ?

I am trying to find a way to get alerting but promql join queries are hard to write

@menardorama Try this one

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: pvc-usage
  labels:
    app: kube-prometheus-stack
    release: prome
spec:
  groups:
  - name: pvc-usage
    rules:
    - alert: pvc-usage-gt-80
      annotations:
        summary: "PVC: {{ $labels.persistentvolumeclaim }} usage more than 80%"
        description: "pvc usage > 80%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
      expr: max by(persistentvolumeclaim,pvc_namespace) (pvc_usage) > 0.8
      for: 0m
      labels:
        severity: warning

kais271 commented 2 years ago

Issues close after 30d of inactivity. Reopen the issue with /reopen.

menardorama commented 2 years ago

Hi Thanks a lot, I have succesfully deploy version 0.3, I may have something weird.

I don't know if it's due to the "max by" function but once you have a disk full, it will always appear as an alert....

But suprisingly the alert disappear when I restart all the pods of the exporter.

I can try to provide detailed information if you want.

menardorama commented 2 years ago

/reopen

kais271 commented 2 years ago

Do you mean that the alert will disappear when just restart pvc-exporter ? no restart prometheus or alertmanager?

menardorama commented 2 years ago

The alert just disappear when I rester the exporter

Seems that pvc-exporter pod got a cache ?

kais271 commented 2 years ago

The exporter itself has no cache, all monitoring data are stored in prometheus. Could you please provide the steps to reproduce ?

kais271 commented 2 years ago

Issues close after 30d of inactivity. Reopen the issue with /reopen.

kais271 / pvc-exporter

Provide standard counters #19

This will be provide 2 metrics: pvc_usage and pvc_mapping