paregupt / ucs_traffic_monitor

Cisco UCS traffic monitoring using Grafana, InfluxDB and Telegraf
MIT License
80 stars 25 forks source link

UCS Traffic Monitoring (UTM)

Full-blown traffic monitoring of Cisco UCS servers using Grafana, InfluxDB and Telegraf.

PS: Please help decide UTM enhancements by these polls: https://github.com/paregupt/ucs_traffic_monitor/discussions

Sister Projects

Looking for something similar to monitor Cisco MDS Switches?

Click here to check out MDS Traffic Monitoring (MTM)

Looking for something similar to monitor Cisco Nexus Switches?

Click here to check out Nexus Traffic Monitoring (NTM)

Use cases

Locations Dashboard enter image description here

UCS Domains Overview enter image description here

Top 10 ports, service profiles, etc. UTM_v0 6-overview

Load Balance verification and root cause enter image description here

Congestion Monitoring and detection UTM_v0 6-congestion

End-to-end mapping from vHBA/vNIC to FI uplink Port enter image description here

Integrated documentation with conceptual drawing and detailed explanations enter image description here

Link utilization and errors UTM_v0 6-link-tabular-view

and much more...

Installation

Two options:

DIY Installation

  1. Install Telegraf
  2. Install InfluxDB
  3. Install Grafana. Install following plugins:
    1. Flowchart
    2. Pie Chart (using Pie chart v2 starting UTM v0.6)
    3. ePict panel (Not needed starting UTM v0.6)
    4. multistat (Not needed starting UTM v0.6)
  4. Install following Python modules
    1. Cisco UCSM Python SDK
    2. netmiko library

OVA installation

Download OVA from releases page. This is a CentOS 7.6 based OVA. Deployment is same as any other OVA that you have deployed before. Click here for detailed installation instructions of the UTM OVA. The OVA is based on v0.3. Upgrading to the latest must be your first step.

Upgrades

You are responsible to upgrade Grafana, InfluxDB, Telegraf, Python and other packages. Upgrading UTM is simple with one or two commands and doesn't take more than a few minutes. Please refer to respective packages for upgrade process. Please keep a watch on the security vulnerabilities and fixes.

Configuration

ucs_traffic_monitor.py fetches metrics from Cisco UCS and stitches them. This file is invoked by telegraf exec input plugin every 60 seconds. Login credentials of UCS should be available in ucs_domains_group*.txt.

Try

$ python3 /usr/local/telegraf/ucs_traffic_monitor.py -h

if you are running this for the first time.

Change/Add to your telegraf.conf file as below

[[inputs.exec]]
   interval = "60s"
   commands = [
       "python3 /usr/local/telegraf/ucs_traffic_monitor.py /usr/local/telegraf/ucs_domains.txt influxdb-lp -vv",
   ]
   timeout = "50s"
   data_format = "influx"

also update the global values like

  logfile = "/var/log/telegraf/telegraf.log"
  logfile_rotation_max_size = "10MB"
  logfile_rotation_max_archives = 5

This should be able to

  1. Pull metrics from UCS every 60 seconds
  2. Stitch them end-to-end between FI uplink ports and vNIC/vHBA on blade servers
  3. Write the data to InfluxDB

Import the dashboards into Grafana. That's all. UTM should be fully functional.

For detailed steps-by-step instructions, especially if you do not have prior experience with Grafana, InfluxDB and Telegraf, check out: Cisco UCS monitoring using Grafana, InfluxDB, Telegraf – UTM Installation

Credits