influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.51k stars 5.56k forks source link

Timestream plugin failing to write with empty dimensions from Redfish input plugin #8470

Closed peterulsteen closed 10 months ago

peterulsteen commented 3 years ago

Relevant telegraf.conf:

# Configuration for Amazon Timestream output.
[[outputs.timestream]]
  mapping_mode = "multi-table"
  create_table_if_not_exists = true

# Redfish input plugin
[[inputs.redfish]]

System info:

Telegraf agent: Telegraf 1.16.2 in telegraf:latest official Docker image Docker Desktop Community (Windows) 2.5.0.1 WSL2 Ubuntu 20.04 Windows 10

Redfish server: Dell PowerEdge C6420 iDRAC 9 with firmware 4.22.00.53 (latest)

Docker

Docker image: telegraf:latest

Steps to reproduce:

  1. Setup AWS Timestream output plugin and confirm functioning.
  2. Setup Redfish input plugin and confirm functioning.
  3. Run Telegraf

Expected behavior:

Telegraf reading temps, fans, wattages, and voltages from Dell's implementation of Redfish REST API, then writing them to AWS Timestream.

Actual behavior:

Telegraf is reading from Redfish, but the Timestream plugin is throwing the below error for each metric, and thus not writing:

E! [outputs.timestream] Failed to write to Timestream database 'hardware' table 'redfish_power_voltages'. Skipping metric! Error: 'InvalidParameter: 4 validation error(s) found.
- minimum field size of 1, WriteRecordsInput.CommonAttributes.Dimensions[1].Value.
- minimum field size of 1, WriteRecordsInput.CommonAttributes.Dimensions[4].Value.
- minimum field size of 1, WriteRecordsInput.CommonAttributes.Dimensions[8].Value.
- minimum field size of 1, WriteRecordsInput.CommonAttributes.Dimensions[9].Value.

Those dimensions are, respectively, datacenter, row, rack, and room. All of these are part of the location struct in the Redfish plugin. If I populate those fields in iDRAC, I then get "1 validation error(s) found" with row, rack, and room populated properly, but with datacenter still an empty string.

Furthermore, this exact telegraf.conf is working just fine with a Dell R630 running iDRAC 8 firmware version 2.75.75.75 with or without the datacenter, row, rack, and room fields populated in iDRAC.

Additional info:

Below is typical output running the above configuration but with the Printer processor plugin enabled:

2020-11-25T03:21:04Z D! [outputs.timestream] Writing to Timestream: '{
  CommonAttributes: {
    Dimensions: [
      {
        Name: "address",
        Value: "<REDACTED: fully qualified domain name of the target Dell server>"
      },
      {
        Name: "datacenter",
        Value: ""
      },
      {
        Name: "health",
        Value: "OK"
      },
      {
        Name: "name",
        Value: "System Board BP0 PG"
      },
      {
        Name: "row",
        Value: ""
      },
      {
        Name: "source",
        Value: "<REDACTED: hostname>"
      },
      {
        Name: "state",
        Value: "Enabled"
      },
      {
        Name: "host",
        Value: "138c41d0bdd9"
      },
      {
        Name: "rack",
        Value: ""
      },
      {
        Name: "room",
        Value: ""
      }
    ],
    Time: "1606274462",
    TimeUnit: "SECONDS"
  },
  DatabaseName: "hardware",
  Records: [{
      MeasureName: "reading_volts",
      MeasureValue: "1",
      MeasureValueType: "DOUBLE"
    }],
  TableName: "redfish_power_voltages"
}' with ResourceNotFoundRetry: 'true'
2020-11-25T03:21:04Z E! [outputs.timestream] Failed to write to Timestream database 'hardware' table 'redfish_power_voltages'. Skipping metric! Error: 'InvalidParameter: 4 validation error(s) found.
- minimum field size of 1, WriteRecordsInput.CommonAttributes.Dimensions[1].Value.
- minimum field size of 1, WriteRecordsInput.CommonAttributes.Dimensions[4].Value.
- minimum field size of 1, WriteRecordsInput.CommonAttributes.Dimensions[8].Value.
- minimum field size of 1, WriteRecordsInput.CommonAttributes.Dimensions[9].Value.
'

Any help or guidance will be greatly appreciated! Please let me know if more details are need.

sjwang90 commented 3 years ago

@peterulsteen This all looks like a error on the Timestream plugin? If you output to even outputs.file your redfish metrics read fine?

@piotrwest Do you have any insight on this?

peterulsteen commented 3 years ago

@peterulsteen This all looks like a error on the Timestream plugin? If you output to even outputs.file your redfish metrics read fine?

@sjwang90 That's correct. All the redfish metrics are reading correctly with no datacenter, rack, room, or row tags. I'm not sure why they're present (but empty) CommonAttributes from the Timestream plugin. One thing which may be of note is that this server in question is a Dell PowerEdge C6420 node/sled which is one of four nodes sharing the same 2U chassis. The chassis itself is what provides the power supplies, fans, and some temp sensors, but that data is showing in redfish as expected. I'm not seeing this issue on more typical 1U Dell PowerEdge hardware.

piotrwest commented 3 years ago

Hi @peterulsteen, as I understand you are trying to ingest data to Amazon Timestream with empty dimensions (InfluxDB tags). You mentioned that when you populate row, rack, and roomdimensions with non-empty values, the error for those dimensions does not occur. However, the error still occurs for an empty datacenter dimension. This is expected behavior, as the data ingested to Timestream has to conform to the Timestream limits. At this time, empty strings are not allowed for dimension names and values, however I did pass your feedback to Timestream Team. As an immediate solution, please transform the data before ingesting it to Timestream to avoid the errors.

I would suggest using a processor plugin (for example: https://github.com/influxdata/telegraf/tree/master/plugins/processors/regex ) to replace empty values with something like “none” or “empty”. The best solution would be to remove empty tags before sending the metrics to Timestream. @sjwang90 can you advise how to drop empty tags?

Please note that there are other Timestream Service limits. Check this link for details: https://docs.aws.amazon.com/timestream/latest/developerguide/ts-limits.html

@sjwang90 – on a side note - are empty tag values valid? Linking related issue: https://github.com/influxdata/telegraf/pull/2404

sjwang90 commented 3 years ago

To drop tags before it hits the output you can configure the names in tagdrop: https://github.com/influxdata/telegraf/blob/master/docs/CONFIGURATION.md#metric-filtering.

Empty tag values should be valid. I haven't seen any issues arise with that in awhile.

peterulsteen commented 3 years ago

I have now tested this on 13 Dell servers (mostly R630's and R640's except for the four C6420 nodes). This is working on all 6 Dell servers running iDRAC 8, but does NOT work with the 7 Dell servers with iDRAC 9.

The empty tags do not appear at all when printing the output to stdout or to a file. I'm only seeing this warning message when outputs.timestream attempts to write CommonAttributes and ONLY for hardware running iDRAC 9, whether or not those fields are populated in iDRAC itself. For all of our older servers running iDRAC 8 the CommonAttributes does not include those empty tags unless I have a value in those respective fields.

With this information, does this point to an issue between inputs.redfish and Dell iDRAC 9?

powersj commented 11 months ago

@peterulsteen,

Wondering if this is still an issue?

Catching up on this issue, it reads that timestream does not accept empty tags. Your metrics do not show up as empty, when printed to stdout, I assume using the outputs.file output? It is not clear to me how those are showing up as empty if that is the case.

If it is still an issue, can you reproduce on the latest telegraf, and provide the metrics generated using the outputs.file plugin?

Thanks!

telegraf-tiger[bot] commented 10 months ago

Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Forums or provide additional details in this issue and reqeust that it be re-opened. Thank you!