opsdis / aci-exporter

A Cisco ACI Prometheus exporter written in Go
https://www.opsdis.com
GNU General Public License v3.0
47 stars 16 forks source link

up metrics continue 1 when aci node is unreachable. #70

Closed minefuto closed 1 week ago

minefuto commented 3 weeks ago

up metrics continue 1 when aci node is unreachable(e.g. aci node down) using "node based queries". I think if aci node is unreachable, up metrics shoud be goes to 0. I added node probe per scraping. The node probe checks aci-exporter can access to aci node or not by http request.

thenodon commented 2 weeks ago

Tanks for the PR @minefuto. I agree that if the node is down it should return up=0, but your code will add one extra http call on every request, which I do not like if it can be avoided. If an error happen at the GET request, like from func (p aciAPI) getClassMetrics nil is returned on the channel that expect []MetricDefinition, so I think it could be resolved by logic that check for nil in the loop https://github.com/opsdis/aci-exporter/blob/8eef4f6107d8c4ad5b67f0436e74b9ebee117e42/aci-api.go#L139. Something like

    for i := 0; i < 4; i++ {
        result := <-ch
        if result != nil {
            metrics = append(metrics, result...)
        } else {
                    // handle down
        }
    }

I you have the time to give it a try it would be great.

sonarcloud[bot] commented 2 weeks ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

minefuto commented 2 weeks ago

Thanks feedback @thenodon. I understand.

I think it could be resolved by logic that check for nil in the loop

I think if one of the query configuration(class, group class, compound) is empty, up metrics goes to 0 because empty configuration query return nil. So I implemented up metrics goes to 0 if all queries return nil. Could you please feedback again?

thenodon commented 2 weeks ago

@minefuto the way you have done it now logging and the timer metrics will be lost, so it will not be consistent. If you want you can make an issue and I can implement it.

minefuto commented 1 week ago

OK. I will open issue. Thanks @thenodon .