Output annotated CSV - Githubissues

urbanogilson commented 1 year ago

Use Case

The CSV Serializer https://github.com/influxdata/telegraf/tree/master/plugins/serializers/csv does not have an option to export annotated CSV, which makes it incompatible with the influx CLI write option https://docs.influxdata.com/influxdb/cloud/reference/cli/influx/write.

As we have a mechanism to export data using CSV in telegraf and import a CSV using influx CLI, by implementing this feature the integration will be seamless in influx solutions also using CSV.

Expected behavior

Configuration option csv_annotated to export annotated CSV syntax compliant with https://docs.influxdata.com/influxdb/cloud/reference/syntax/annotated-csv.

[[outputs.file]]
  ## Files to write to, "stdout" is a specially handled file.
  files = ["stdout", "/tmp/metrics.out"]

  ## Data format to output.
  ## Each data format has its own unique set of configuration options, read
  ## more about them here:
  ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md
  data_format = "csv"

  ## The default timestamp format is Unix epoch time.
  # Other timestamp layout can be configured using the Go language time
  # layout specification from https://golang.org/pkg/time/#Time.Format
  # e.g.: csv_timestamp_format = "2006-01-02T15:04:05Z07:00"
  # csv_timestamp_format = "unix"

  ## The default separator for the CSV format.
  # csv_separator = ","

  ## Output the CSV header in the first line.
  ## Enable the header when outputting metrics to a new file.
  ## Disable when appending to a file or when using a stateless
  ## output to prevent headers appearing between data lines.
  # csv_header = false

  ## Prefix tag and field columns with "tag_" and "field_" respectively.
  ## This can be helpful if you need to know the "type" of a column.
  # csv_column_prefix = false

  ## Output Annotated CSV.
  ## This can be helpful if you want to import using Influx CLI.
  # csv_annotated = false

Actual behavior

N/A

Additional info

No response

powersj commented 1 year ago

Hi,

As we have a mechanism to export data using CSV in telegraf and import a CSV using influx CLI, by implementing this feature the integration will be seamless in influx solutions also using CSV.

The CLI also accepts line protocol, which is the native format for sending data to InfluxDB.

I realize you may have some sort of workflow set up, but can I ask why you are not using Telegraf to directly send the data to InfluxDB as well?

urbanogilson commented 1 year ago

I want to clarify that I am actively using Telegraf to send data directly to InfluxDB.

However, the suggested improvement is to have seamless integration of annotated CSV files, similar to the existing feature provided by Influx Line Protocol integration.

While I strongly believe that leveraging the Influx Line Protocol or streaming data directly from Telegraf to InfluxDB is the most efficient integration method, I would greatly appreciate the availability of a continuous integration option designed specifically for CSV files.

powersj commented 1 year ago

the suggested improvement is to have seamless integration of annotated CSV files,

Outside of influxdb what is the use case? I'm trying to understand the problem you are solving with this suggestion as the justification provided originally was around ingest of InfluxDB.

If you consider the 3 rows of typical annotations:

1 - group this is not applicable to this data so it would always be false
2 - default this would also not be applicable so it would always be empty
3 - datatype this has challenges as well as the timestamp format could be anything the user specifies, so it may not follow the number or RFC3339 that influxdb expects.

After spending some time with the CSV serializer and playing with the CSV header option that exists today I am not inclined to add this even more. Take for example, if you have multiple metrics, which is generally the case:

[[inputs.exec]]
  commands = ["echo mymeasure,foo=bar,baz=biz value=1"]
  data_format = "influx"

[[inputs.exec]]
  commands = ["echo cpu,tag=hosta value=42"]
  data_format = "influx"

[[inputs.exec]]
  commands = ["echo cpu,tag=hostb value=4444"]
  data_format = "influx"

This will result in the following in line protocol:

cpu,host=ryzen,tag=hostb value=4444 1689111906000000000
mymeasure,baz=biz,foo=bar,host=ryzen value=1 1689111906000000000
cpu,host=ryzen,tag=hosta value=42 1689111906000000000

In CSV:

1689111917,mymeasure,biz,bar,ryzen,1
1689111917,cpu,ryzen,hosta,42
1689111917,cpu,ryzen,hostb,4444

Notice right away that we now have a mismatch of columns as the first value has additional tags and the "value" field is not aligned and now you would be needing to separate these by group as well.

Thoughts?

urbanogilson commented 1 year ago

I was trying different output options and trying to ingest into the influx db, and I noticed that Telegraf's CSV is not compatible by default with the influx CLI.

I agree that due to CSV limitations, it's not worth spending time to implement an aligned CSV. It would be easier for a single input, but I don't think this is typically the use case of collecting a single input.

Just to clarify what the use case is for exporting it in CSV, and how do people use or integrate it?

powersj commented 1 year ago

Just to clarify what the use case is for exporting it in CSV, and how do people use or integrate it?

I am not certain, hence why I am asking :) I assume a user just wants CSV data in general, but the more common use case is that users are taking CSV data and parsing it via Telegraf to send somewhere else.

Thanks for the response, I am going to close this since this isn't something we would add.

influxdata / telegraf

Output annotated CSV #13593

Use Case

Expected behavior

Actual behavior

Additional info