Open jkroepke opened 1 year ago
Seeing the same. Creates gaps in all metrics gathered. Would prefer if it just wrote an error
The underlying issue is that panic()
is used to handle errors, further up in the callstack it is attempted to "catch" those with recover()
which does not work
Recover is a built-in function that regains control of a panicking goroutine. Recover is only useful inside deferred functions. During normal execution, a call to recover will return nil and have no other effect. If the current goroutine is panicking, a call to recover will capture the value given to panic and resume normal execution. further details
This is basically what is being done here and putting it into the playground quickly shows that it fails.
package main
import (
"fmt"
)
func foo() {
panic("foobar")
}
func main() {
foo()
recover := recover()
fmt.Println(recover)
}
The really BIG problem is that this incorrect treatment of errors is used for the back-off-retry mechanic in github.com/webdevops/go-common/prometheus/collector IMO this entire behavior should be broken
Currently I am working on putting a bandaid at least onto the cost collector to get rid of this issue. However an entire rework of error handling in (at least) this project, the azure-resourcegraph-exporter and go-common is required
Furthermore I don't understand why the prometheus go client library is not used more extensively here
Hi,
from time to time we are receiving error from Azure for cost management related errors.
Some of them are persistent and after five retries, the exporter will be panic and exited.
I would like to ask, if the behavior can be changed from panic level to error level. I do not need any benefit of letting the exporter terminate.
Example panic trace after 5 retries: