vapor-ware / synse-snmp-plugin

SNMP plugin for Synse
GNU General Public License v3.0
1 stars 2 forks source link

Connectivity / Error Handling #55

Closed MatthewHink closed 3 years ago

MatthewHink commented 3 years ago

The connectivity to the Edens UPS is very poor right now. Even snmpwalk gives up with a timeout.

Problems: The SNMP plugin cannot read data and cannot initialize correctly. The plugin does not fail to initialize and exit, it just chugs along like it's working.

Possible Solutions: Could it be a bad network cable? Can we add retries and/or lengthen the network timeout? Exit the plugin on initialization failure.

Log snippet below. We should exit here: "failed to create table"

time="2020-11-05T18:36:24.42Z" level=info msg="[snmp] initializing UPS"
time="2020-11-05T18:36:24.42Z" level=debug msg="model is: [PXGMS UPS + EATON 93PM]"
time="2020-11-05T18:36:24.42Z" level=info msg="[snmp] loaded device config" config="&{V3 10.193.3.241  30s 0xc000194c80 161}"
time="2020-11-05T18:36:24.42Z" level=debug msg="[snmp] created new SNMP client"
time="2020-11-05T18:36:24.42Z" level=debug msg="[snmp] created SNMP server base"
time="2020-11-05T18:36:24.42Z" level=debug msg="[snmp] initializing UpsMib"
time="2020-11-05T18:36:24.42Z" level=debug msg="[snmp] creating new table" name=UPS-MIB-UPS-Identity-Table oid=.1.3.6.1.2.1.33.1.1
time="2020-11-05T18:36:24.42Z" level=debug msg="[snmp] loading data from SNMP server" table=UPS-MIB-UPS-Identity-Table
time="2020-11-05T18:36:54.42Z" level=error msg="[snmp] failed to create table" error="Request timeout (after 0 retries)" table=UPS-MIB-UPS-Identity-Table
time="2020-11-05T18:36:54.42Z" level=error msg="failed to create the UPS MIB" error="Request timeout (after 0 retries)"
time="2020-11-05T18:36:54.42Z" level=info msg="[device manager] failed dynamic device config; skipping since its optional" err="Request timeout (after 0 retries)"
time="2020-11-05T18:36:54.42Z" level=info msg="[device manager] created devices" devices=0
time="2020-11-05T18:36:54.42Z" level=debug msg="[device manager] creating dynamic devices..."
time="2020-11-05T18:36:54.42Z" level=debug msg="[server] initializing"
MatthewHink commented 3 years ago

Timeout is already 30 seconds, which seems long enough https://github.com/vapor-ware/synse-snmp-plugin/blob/c48493944c227499963303b88088eadcbca5a44e/pkg/snmp/core/client.go#L123

We can add retries to GoSNMP.Retries. (Start with 3).

edaniszewski commented 3 years ago

Yeah, I've always been torn on what the right behavior is for situations like this (e.g. keep retrying, chug along, fail fast, etc). I think what you're suggesting feels right though - add some retries, and then if that fails, we could exit the plugin on initialization failure. It seems like its better for it to go down than to keep running but not actually doing anything. Plus, since it should always be run with some sort of manager (systemd, kubernetes), it would get restarted, etc from that.

MatthewHink commented 3 years ago

Fixed https://github.com/vapor-ware/synse-snmp-plugin/pull/56