paregupt / ucs_traffic_monitor

Cisco UCS traffic monitoring using Grafana, InfluxDB and Telegraf
MIT License
80 stars 25 forks source link

PAUSE graphs are no longer plotting - 4.2(2d) #94

Closed lucgomespagseguro closed 1 year ago

lucgomespagseguro commented 1 year ago

Hello @paregupt, after updating the UCS infrastructure to version 4.2(2d), all graphs related to PAUSE counters are no longer being generated. Would you help me? All other graphs are being collected and generated normally. graph_pause_01 graph_pause_02

lucgomespagseguro commented 1 year ago

2023-02-13 10:53:03,159 - ERROR - ConnectHandler failed for domain xx.xxx.xxx.xxx. EOFError : Traceback (most recent call last): File "/usr/local/telegraf/ucs_traffic_monitor.py", line 423, in set_ucs_connection timeout=user_args.get('conn_timeout')) File "/usr/local/lib/python3.6/site-packages/netmiko/ssh_dispatcher.py", line 259, in ConnectHandler return ConnectionClass(*args, kwargs) File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 327, in init self._open() File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 332, in _open self.establish_connection() File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 901, in establish_connection self.remote_conn_pre.connect(ssh_connect_params) File "/usr/local/lib/python3.6/site-packages/paramiko/client.py", line 406, in connect t.start_client(timeout=timeout) File "/usr/local/lib/python3.6/site-packages/paramiko/transport.py", line 660, in start_client raise e File "/usr/local/lib/python3.6/site-packages/paramiko/transport.py", line 2055, in run ptype, m = self.packetizer.read_message() File "/usr/local/lib/python3.6/site-packages/paramiko/packet.py", line 459, in read_message header = self.read_all(self.__block_size_in, check_rekey=True) File "/usr/local/lib/python3.6/site-packages/paramiko/packet.py", line 303, in read_all raise EOFError() EOFError 2023-02-13 10:53:03,162 - ERROR - Exiting for xx.xxx.xxx.xxx due to invalid cli_handle 2023-02-13 10:53:03,244 - INFO - Query class_ids for xx.xxx.xxx.xxx

paregupt commented 1 year ago

Try a manual ssh to FI from the UTM host and if it doesn't work, please resolve it. The logs show a timeout. Also, seems like too many domains in the same file. Consider adding one domain per input file, but this is not related to your Pause issues. Finally, if it's sensitive, please remove IPs from your logs.

lucgomespagseguro commented 1 year ago

Hi @paregupt, I did a test manually logging into the UCS and it worked

" [root@tbvmgrafana-utah-tb-cisco ucs_traffic_monitor]# ssh utm_user@xx.xxx.xxx.xxx Cisco UCS 6300 Series Fabric Interconnect Password: Cisco Nexus Operating System (NX-OS) Software TAC support: http://www.cisco.com/tac Copyright (c) 2009, Cisco Systems, Inc. All rights reserved. The copyrights to certain works contained in this software are owned by other third parties and used and distributed under license. Certain components of this software are licensed under the GNU General Public License (GPL) version 2.0 or the GNU Lesser General Public License (LGPL) Version 2.1. A copy of each such license is available at http://www.opensource.org/licenses/gpl-2.0.php and http://www.opensource.org/licenses/lgpl-2.1.php

fi3-tb-pagpod2-A# "

Do you need to break domains by files? Is there any other solution? There are 7 other domains, this is the only one I have a problem with after updating to version 4.2(2d)

paregupt commented 1 year ago

Do you some other error in the logs? That may give a hint. If not, please write to my cisco email, which is same as my GitHub id, and I can take a look.

lucgomespagseguro commented 1 year ago

Ok, I just sent you an email with the complete logs.

paregupt commented 1 year ago

For those who run into this issue, here is how we solved it. We worked on this systems live. Seems like the netmiko library wasn't happy with opening ssh connection to the new UCSM or vice versa.

Following steps resolved the issue

  1. Update Python to 3.7
  2. Install UCSMSDK
  3. Install netmiko
  4. Edit /etc/telegraf/telegraf.conf file to use python3.7

The aim was to upgrade the netmiko library but its later version was not compatible with Python 3.6. This means, Python must be upgraded to version 3.7 or later. New version of Python requires installing ucsmsdk for it and then running the UTM collector via the new Python.

For us it look much longer because we tried installing Python 3.11, which is the latest as of today. However, Python 3.11 ran into SSL dependency on pip which prevented installation of netmiko and ucsmsdk. In general, I recommend running the latest modules, but in this case, we used the latest rebuild of Python 3.7 which did not give the SSL issues with pip and allowed to install netmiko and ucsmsdk.

Do not run these steps until you run into this issue. I am not sure if all systems will run into this issue at all. UCSM 4.2(2d).

lucgomespagseguro commented 1 year ago

Hi @paregupt,

I ended up needing to install the "requests" module as well, in addition to the "netmiko" and "ucsmsdk" modules:

`[root@gtvmgrafana-utah-gt-cisco Python-3.7.16]# tail -f /var/log/telegraf/telegraf.log 2023-02-27T01:34:20Z E! [inputs.exec] Error in plugin: exec: signal: terminated for command 'python3 /usr/local/telegraf/ucs_traffic_monitor.py /usr/local/telegraf/ucs_domains_group_1.txt influxdb-lp -vv': 2023-02-27T01:34:24Z I! [agent] Hang on, flushing any cached metrics before shutdown 2023-02-27T01:34:24Z I! [agent] Stopping running outputs 2023-02-27T01:34:24Z I! Loaded inputs: cpu disk diskio exec (2x) kernel mem net processes snmp (3x) swap system 2023-02-27T01:34:24Z I! Loaded aggregators: 2023-02-27T01:34:24Z I! Loaded processors: regex (3x) 2023-02-27T01:34:24Z I! Loaded outputs: influxdb 2023-02-27T01:34:24Z I! Tags enabled: host=gtvmgrafana-utah-gt-cisco 2023-02-27T01:34:24Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"gtvmgrafana-utah-gt-cisco", Flush Interval:10s 2023-02-27T01:34:30Z E! [inputs.exec] Error in plugin: exec: exit status 1 for command 'python3.7 /usr/local/telegraf/mds_traffic_monitor_high_frequency.py /usr/local/telegraf/mds_group_pagcloud.txt influxdb-lp -vv': Traceback (most recent call last):... 2023-02-27T01:35:00Z E! [inputs.exec] Error in plugin: exec: exit status 1 for command 'python3.7 /usr/local/telegraf/mds_traffic_monitor_high_frequency.py /usr/local/telegraf/mds_group_pagcloud.txt influxdb-lp -vv': Traceback (most recent call last):...

[root@gtvmgrafana-utah-gt-cisco Python-3.7.16]# python3.7 /usr/local/telegraf/mds_traffic_monitor_high_frequency.py /usr/local/telegraf/mds_group_pagcloud.txt influxdb-lp -vv Traceback (most recent call last): File "/usr/local/telegraf/mds_traffic_monitor_high_frequency.py", line 17, in import requests ModuleNotFoundError: No module named 'requests' `

After all it worked!