influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.46k stars 5.55k forks source link

More error details in /plugins/outputs/azure_monitor/azure_monitor.go #10884

Open Hr0bar opened 2 years ago

Hr0bar commented 2 years ago

Feature Request

Hi, we have been battling with some Azure issues, and it was very hard to determine what is going on from the Telegraf logs due to the error messages not containing specific error details as returned by Azure, but only status code and status message.

Proposal:

Modify the most important/often hit error logging in azure_monitor.go to something like this (to contain response body message):

    respbody, err := io.ReadAll(resp.Body)
    if err != nil || resp.StatusCode < 200 || resp.StatusCode > 299 {
        return fmt.Errorf("failed to write batch: [%v] %s ResponseBody: %s", resp.StatusCode, resp.Status, string(respbody))
    }

currently its just this:

    _, err = io.ReadAll(resp.Body)
    if err != nil || resp.StatusCode < 200 || resp.StatusCode > 299 {
        return fmt.Errorf("failed to write batch: [%v] %s", resp.StatusCode, resp.Status)
    }

Current behavior:

[agent] Error writing to outputs.azure_monitor: failed to write batch: [400] 400 Bad Request

Desired behavior:

[agent] Error writing to outputs.azure_monitor: failed to write batch: [400] 400 Bad Request ResponseBody: {"error":{"code":"RegionMismatch","message":"This endpoint : westus.monitoring.azure.com does not accept metrics for the resource's region : westus. Ensure you are sending to the correct regional endpoint."}}

Use case:

By searching for similar issues there were many in the past where people were missing this error detail information, for example https://github.com/influxdata/telegraf/issues/5063 and others

powersj commented 2 years ago

See #10866