paregupt / ucs_traffic_monitor

Cisco UCS traffic monitoring using Grafana, InfluxDB and Telegraf
MIT License
80 stars 25 forks source link

New 4.2(2c) Domain Added to Config, but not loading in UCS Monitor #95

Open brishel opened 1 year ago

brishel commented 1 year ago

We've added a newly built, 4.2(2c) UCS Domain and are unable to get it to load properly in UCS Monitor. When looking at the logs, we are seeing the following errors at boot. It appears that Python is not able to launch the primary python script correct. Our other 6 UCS Domains are all loading properly, we've got 1 stale entry that was entered in the past but not purged from the system yet, and this new domain will not load at all.

I've went through the Telegraf.conf file, all is set there. The path to the file, file name are correct. The URL, username and password are correct on the ucs_domains_group_1.txt file

All are running locally on UCS Manager with a UCS Central. No IMM here.

4 domains working, running 4.1(3h) 1 domain working, that was in the UCS Mon previously, then firmware upgraded from 4.1(3h) to 4.2(2c) 1 domain not working, that is newly built as 4.2(2c)

Any suggestions?

Log file

2023-04-11T00:20:09Z E! [inputs.exec] Error in plugin: exec: signal: terminated for command 'python3 /usr/local/telegraf/ucs_traffic_monitor.py /usr/local/telegraf/ucs_domains_group_1.txt influxdb-lp -vv': 2023-04-11T00:20:09Z I! [agent] Hang on, flushing any cached metrics before shutdown 2023-04-11T00:20:09Z I! [agent] Stopping running outputs 2023-04-11T00:20:45Z I! Loaded inputs: cpu disk diskio exec (2x) kernel mem net processes swap system 2023-04-11T00:20:45Z I! Loaded aggregators: 2023-04-11T00:20:45Z I! Loaded processors: 2023-04-11T00:20:45Z I! Loaded outputs: influxdb 2023-04-11T00:20:45Z I! Tags enabled: host=ucsmon01 2023-04-11T00:20:45Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"ucsmon01", Flush Interval:10s 2023-04-11T00:20:45Z W! [outputs.influxdb] When writing to [http://localhost:8086]: database "telegraf" creation failed: Post "http://localhost:8086/query": dial tcp [::1]:8086: connect: connection refused 2023-04-11T00:23:50Z E! [inputs.exec] Error in plugin: exec: command timed out for command 'python3 /usr/local/telegraf/ucs_traffic_monitor.py /usr/local/telegraf/ucs_domains_group_1.txt influxdb-lp -vv':

Any help is greatly appreciated.

brishel commented 1 year ago

I am able to successfully SSH from the UTM host to the UCS Manager in question. Not sure if I'm running into the same issue as #94 or not. This UCS Monitor has been running for a couple of years.

paregupt commented 1 year ago

Since at least one domain is working with 4.2(2c), for now let's assume that firmware is compatible with UTM.

Are all these domain in the same input file or separate file?

I need to look at the complete logs for further analysis. Feel free to email your log file to my cisco email, which is same as my GitHub id.

brishel commented 1 year ago

Email sent. Your help is greatly appreciated.

brishel commented 1 year ago

@paregupt

Upgrading Python to 3.7, installing UCSMSDK, installing NETMIKO, and modifying the config file to use Python37 resolved our issues. We have both 4.1(3h) and 4.2(2d) running with this new Python 3.7 install version with no issues.

Our hiccup during the install/upgrade was that we were not calling out pip3.7 specifically when installing UCSMSDK and NETMIKO.

7 Domains (4x at 4.1(3h) and 3x at 4.2(2d)) 49 Chassis 307 Servers 2 Locations