paregupt / mds_traffic_monitor

Cisco MDS monitoring using Grafana and InfluxDB
MIT License
24 stars 10 forks source link

Data not showing in dashboards for cisco mds #5

Open protoss1232001 opened 3 years ago

protoss1232001 commented 3 years ago

I have multiple influx DB's on my instance. Where do I specify the db name for the script to write to?

paregupt commented 3 years ago

db name is specified in the grafana dashboard variable. $db and $rp

protoss1232001 commented 3 years ago

While importing the DB in grafana I selected the appropriate db.

Does it have to be specified in any of these files also? /usr/local/telegraf/grafana/dashboards#ls -l total 828 -rw------- 1 root root 38378 Jun 18 11:43 local_sys.json -rw------- 1 root root 233926 Jun 18 11:43 locations.json -rw------- 1 root root 233926 Jun 18 17:42 locations.json.old -rw------- 1 root root 244204 Jun 18 11:43 switches.json -rw------- 1 root root 84405 Jun 18 11:43 switchports.json

paregupt commented 3 years ago

Yes, the Grafana dashboards must know what DB to read the data from. After loading the dashboards in Grafana, edit the name of $db variable. One change per dashboard.

protoss1232001 commented 3 years ago

Thank you. I did select the DB from dropdown when importing the dashboards so I believe we are good there.

For me the log file has not yet been created. which tells me that the collection did not run even after restarting telegraf.

cat /var/log/telegraf/telegraf.log cat: /var/log/telegraf/telegraf.log: No such file or directory

paregupt commented 3 years ago

Collector not running is a separate issue. How did you install it? Are the permissions as required - all files owned by telegraf user. Delete the files if you manually ran the collector.

protoss1232001 commented 3 years ago

Is there documentation on what the required permissions are? Sorry but I did not see that in readme file.

protoss1232001 commented 3 years ago

Looks like a permissions issue.

systemctl status telegraf -l ● telegraf.service - The plugin-driven server agent for reporting metrics into InfluxDB Loaded: loaded (/usr/lib/systemd/system/telegraf.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2021-06-18 12:49:52 EDT; 1 day 1h ago Docs: https://github.com/influxdata/telegraf Process: 59338 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS) Main PID: 59261 (telegraf) CGroup: /system.slice/telegraf.service └─59261 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d

Jun 19 13:57:30 -- telegraf[59261]: 2021-06-19T17:57:30Z E! [inputs.exec] Error in plugin: exec: exit status 2 for command 'python3 /usr/local/telegraf/mds_traffic_monitor_high_frequency.py /usr/local/telegraf/ mds_group_cisfabb.txt influxdb-lp -vv': python3: can't open file '/usr/local/telegraf/mds_traffic_monitor_high_frequency.py': [Errno 13] Permission denied Jun 19 13:57:30 -- telegraf[59261]: 2021-06-19T17:57:30Z E! [inputs.exec] Error in plugin: exec: exit status 2 for command 'python3 /usr/local/telegraf/mds_traffic_monitor_high_frequency.py /usr/local/telegraf/ mds_group_cisfaba.txt influxdb-lp -vv': python3: can't open file '/usr/local/telegraf/mds_traffic_monitor_high_frequency.py': [Errno 13] Permission denied

paregupt commented 3 years ago

Check this: https://www.since2k7.com/blog/2020/02/29/cisco-ucs-monitoring-using-grafana-influxdb-telegraf-utm-installation/

The steps are for UTM but applies to MTM also.

protoss1232001 commented 3 years ago

Thank you so much for that. I gave it a shot and got little further than before. Getting "Error in plugin" now. I'll go back and check again in some time if I missed any step.

Jun 19 18:32:00 telegraf[2628]: 2021-06-19T22:32:00Z E! [inputs.exec] Error in plugin: exec: exit status 1 for command 'python3 /usr/local/telegraf/mds_traffic_monitor_high_frequency.py /usr/local/telegraf/mds_group_roacisfaba.txt influxdb-lp -vv': Traceback (most recent call last):... Jun 19 18:32:30 telegraf[2628]: 2021-06-19T22:32:30Z E! [inputs.exec] Error in plugin: exec: exit status 1 for command 'python3 /usr/local/telegraf/mds_traffic_monitor_high_frequency.py /usr/local/telegraf/mds_group_roacisfabb.txt influxdb-lp -vv': Traceback (most recent call last):... Jun 19 18:32:30 telegraf[2628]: 2021-06-19T22:32:30Z E! [inputs.exec] Error in plugin: exec: exit status 1 for command 'python3 /usr/local/telegraf/mds_traffic_monitor_high_frequency.py /usr/local/telegraf/mds_group_roacisfaba.txt influxdb-lp -vv': Traceback (most recent call last):...

protoss1232001 commented 3 years ago

this is the error from the log "ERROR - NXAPI error from 165.XX.XX.XXX:502:('bad_gateway',)" . Still looking but not sure what this means. Is it unable to reach the. switch and recieve response in timely manner?

more mds_traffic_monitor_high_frequency_mds_group_cisfaba.log 2021-06-19 18:30:03,892 - WARNING - ---------- START (version 0.12)---------- 2021-06-19 18:30:03,893 - INFO - Added 165.XX.XX.XXX (CISFABA) to switch dict, location:Fra 2021-06-19 18:30:03,893 - INFO - Connect (1) and pull stats from:165.XX.XX.XXX 2021-06-19 18:30:03,894 - INFO - Pull stats from 165.XX.XX.XXX for (idx:0)['show version', 'show system resources', 'show system uptime', 'show module'] 2021-06-19 18:30:03,894 - INFO - Pull stats from 165.XX.XX.XXX for (idx:1)['show port-channel usage'] 2021-06-19 18:30:38,011 - ERROR - NXAPI error from 165.XX.XX.XXX:502:('bad_gateway',) 2021-06-19 18:30:38,012 - INFO - Response received as completed:None 2021-06-19 18:30:38,214 - ### ERROR - NXAPI error from 165.XX.XX.XXX:502:('bad_gateway',) 2021-06-19 18:30:38,215 - INFO - Response received as completed:None 2021-06-19 18:30:38,215 - INFO - Prebuilt FC interface string: 2021-06-19 18:30:38,215 - INFO - Prebuilt port-channel interface string: 2021-06-19 18:30:38,215 - INFO - Connect (2) and pull stats from:165.XX.XX.XXX 2021-06-19 18:30:38,216 - INFO - Pull stats from 165.XX.XX.XXX for (idx:2)['show interface counters detailed'] 2021-06-19 18:30:38,219 - INFO - Pull stats from 165.XX.XX.XXX for (idx:3)['show interface transceiver details'] 2021-06-19 18:30:38,222 - INFO - Pull stats from 165.XX.XX.XXX for (idx:4)['show interface '] 2021-06-19 18:31:12,545 - ERROR - NXAPI error from 165.XX.XX.XXX:502:('bad_gateway',) 2021-06-19 18:31:12,545 - INFO - Response received as completed:None 2021-06-19 18:31:14,053 - ERROR - NXAPI error from 165.XX.XX.XXX:502:('bad_gateway',) 2021-06-19 18:31:14,053 - INFO - Response received as completed:None 2021-06-19 18:31:14,129 - ERROR - NXAPI error from 165.XX.XX.XXX:502:('bad_gateway',) 2021-06-19 18:31:14,129 - INFO - Response received as completed:None 2021-06-19 18:31:14,130 - INFO - Printing output in InfluxDB Line Protocol format 2021-06-19 18:31:14,130 - INFO - Printing output - DONE 2021-06-19 18:31:14,130 - INFO - ------------------------------------------------ Response time from - 165.XX.XX.XXX
Command set:1
show version
show system resources
show system uptime
show module
------------------------------------------------
NXAPI Response: 34.32 s Parsing: 0.0 s
------------------------------------------------
Command set:2
show port-channel usage
------------------------------------------------
NXAPI Response: 34.12 s Parsing: 0.0 s
------------------------------------------------
Command set:3
show interface counters detailed
------------------------------------------------
NXAPI Response: 34.33 s Parsing: 0.0 s
------------------------------------------------
Command set:4
show interface transceiver details
------------------------------------------------
NXAPI Response: 35.91 s Parsing: 0.0 s
------------------------------------------------
Command set:5
show interface
------------------------------------------------
paregupt commented 3 years ago

As the error says, the switch is returning bad gateway. Try NXAPI sandbox before. If you want, drop an email to me and I can help to recover all these issues. My email is same as GitHub is at cisco.