jenningsloy318 / redfish_exporter

exporter to get metrics from redfish based hardware such as lenovo/dell/superc servers
Apache License 2.0
70 stars 61 forks source link

Fix "panic: send on closed channel" #68

Closed chiveturkey closed 9 months ago

chiveturkey commented 1 year ago

This PR seeks to fix issues like the following:

2022/04/20 19:31:21  info collector scrape completed System=System.Embedded.1 app=redfish_exporter collector=SystemCollector target=10.208.16.33
panic: send on closed channel

goroutine 916 [running]:
github.com/jenningsloy318/redfish_exporter/collector.parseEthernetInterface(0xc00049cc00, 0x0, 0x0, 0xc000152dc0, 0xc00056e264)
    /go/src/github.com/jenningsloy318/redfish_exporter/collector/system_collector.go:684 +0x465
created by github.com/jenningsloy318/redfish_exporter/collector.(*SystemCollector).Collect
    /go/src/github.com/jenningsloy318/redfish_exporter/collector/system_collector.go:532 +0xbdf

See https://github.com/jenningsloy318/redfish_exporter/issues/64 for examples of others who have encountered the same problem.

These issues occur, because the channel is closed before all goroutines have a chance to complete. In order to avoid that, add a Wait() on each WaitGroup to block until goroutines complete.

In order to maximize parallel execution prior to blocking, initialize all WaitGroups in system_collector prior to walking the System tree. Then, block with Wait() directives after all goroutines are executed. The down side to this approach is that some WaitGroups may be created for resources that aren't present on a given system. However, I believe it's worth it for the parallel execution.

I have been running a local version like this on my fleet for months on 1000s of servers without issue and haven't seen a panic: send on closed channel since.