vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.71k stars 1.57k forks source link

documentation bug -- documented vector metric datatypes are inconsistent with example image #14645

Open breathe opened 2 years ago

breathe commented 2 years ago

A note for the community

Problem

This page is unclear and inconsistent --

https://vector.dev/docs/about/under-the-hood/architecture/data-model/metric/

The image shows an 'example metric event'

{
  "histogram": {
    "name": "login.time",
    "val": 22.0,
    "labels": {
       "host": "12.33.22.11"
    }
}

But histogram is documented as being represented by required fields "buckets", "count", "sum"

"kind" is not documented anywhere but its marked as required and not shown in the example.

I assume there is probably other things out of date or wrong about this documentation ...

Example data to illustrate the metric data model and the various types supported would be nice ...

I'm building a new datasource that will output ndjson to s3 and I want to use vector with the aws_s3/sqs source to ship logs and metrics to various destination sinks. I control the output format and want to output in a generic format that vector can slurp up and ship as logs and metric to destination sinks -- but its surprisingly tricky given the current documentation ...

Given that my input will be logs I suppose I will need to use the logs_to_metric transformation ... given that I can customize my output source -- is it possible to use a totally generic transformation? Something like this maybe?

      "metrics": [
        {
          "type": "{{type}}",
          "field": "{{field}}",
          "name": "{{name}}",
          "namespace": "{{namespace}}",
          "tags": "{{tags}}"
        }

Configuration

n/a

Version

24.1

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

breathe commented 2 years ago

It appears that the log_to_metric transform has no way to support deriving an arbitrary set of tags from a log event ...? The example above fails because the tags configuration parameter needs to be a map ...

I've ended up working around this issue for now with a lua transform ... -- at the moment I only care about supporting gauge metrics and so I've only implemented the transform for the gauge type (I suspect there is likely more sophisticated logic needed for some of the other metrics ...)

It would really nice if I could instead just point the logs_to_metric transform at a field containing all the tags I want defined on the output metric ... To prevent high cardinality metric issues -- I'd use the tag_cardinality_limit transform ...

transforms:
  parsing:
    type: "remap"
    inputs:
      - snowflake_s3
    source: |-
      . = parse_json!(string!(.message))

  route_logs_by_type:
    type: route
    inputs:
      - parsing
    route:
      counter: .TYPE == "counter"
      histogram: .TYPE == "histogram"
      gauge: .TYPE == "gauge"
      set: .TYPE == "set"
      summary: .TYPE == "summary"
      log: .TYPE == "log"

  remap_counter_log_to_metric:
    type: log_to_metric
    inputs:
      - route_logs_by_type.counter
    metrics:
      - type: "counter"
        field: "FIELD"
        name: "{{NAME}}"
        namespace: "{{NAMESPACE}}"

  remap_histogram_log_to_metric:
    type: log_to_metric
    inputs:
      - route_logs_by_type.histogram
    metrics:
      - type: "histogram"
        field: "FIELD"
        name: "{{NAME}}"
        namespace: "{{NAMESPACE}}"

  # {"name":"storage.table.retained_bytes.avg","namespace":"snowflake","tags":{"env":"dev","schema":"STAGING","service":"snowflake","table":"STORAGE_LOG"},"timestamp":"2022-10-05T21:09:16Z","kind":"absolute","gauge":{"value":0.0}}
  remap_gauge_log_to_metric:
    type: lua
    version: "2"
    inputs:
      - route_logs_by_type.gauge
    hooks:
      process: |-
        function (event, emit)
          event.metric = {
            name = event.log.NAME,
            namespace = event.log.NAMESPACE,
            kind = "absolute",
            timestamp = os.date("!*t"),
            tags = event.log.TAGS,
            gauge = {
              value = event.log.FIELD
            }
          }
          event.log = nil
          emit(event)
        end

  # {"name":"storage.table.retained_bytes.avg","namespace":"snowflake","timestamp":"2022-10-05T21:04:04.776646Z","kind":"absolute","gauge":{"value":0.0}}
  # remap_gauge_log_to_metric:
  #   type: log_to_metric
  #   inputs:
  #     - route_logs_by_type.gauge
  #   metrics:
  #     - type: "gauge"
  #       field: "FIELD"
  #       name: "{{NAME}}"
  #       namespace: "{{NAMESPACE}}"

  remap_set_log_to_metric:
    type: log_to_metric
    inputs:
      - route_logs_by_type.set
    metrics:
      - type: "gauge"
        field: "FIELD"
        name: "{{NAME}}"
        namespace: "{{NAMESPACE}}"

  remap_summary_log_to_metric:
    type: log_to_metric
    inputs:
      - route_logs_by_type.summary
    metrics:
      - type: "summary"
        field: "FIELD"
        name: "{{NAME}}"
        namespace: "{{NAMESPACE}}"

  remap_log_field_to_message:
    type: remap
    inputs:
      - route_logs_by_type.log
    source: |-
      .message = .FIELD
      del(.FIELD)
      .ddsource = .NAMESPACE
      del(.NAMESPACE)
      .service = .TAGS.service
      .ddtags = .TAGS
      del(.TAGS)
      del(.ddtags.service)
      del(.TYPE)
      .timestamp = now()

sinks:
  datadog_metrics:
    type: datadog_metrics
    inputs:
      - remap_counter_log_to_metric
      - remap_histogram_log_to_metric
      - remap_gauge_log_to_metric
      - remap_set_log_to_metric
      - remap_summary_log_to_metric
    default_api_key: "${DD_API_KEY:?err}"

  datadog_logs:
    type: datadog_logs
    inputs:
      - remap_log_field_to_message
      - route_logs_by_type._unmatched
    default_api_key: "${DD_API_KEY:?err}"

  console_output:
    type: console
    inputs:
      - remap*
    encoding:
      codec: "text"

tests:
  - name: "parsing -> parsing"
    inputs:
      - type: raw
        insert_at: parsing
        value: |-
          {"FIELD":0,"NAME":"storage.table.retained_bytes.avg","NAMESPACE":"snowflake","TAGS":{"env":"dev","schema":"STAGING","service":"snowflake","table":"STORAGE_LOG"},"TYPE":"gauge"}

    outputs:
      - extract_from: parsing
        conditions:
          - type: vrl
            source: |-
              assert!(exists(.TYPE))

  - name: "parsing -> route_logs_by_type.gauge"
    inputs:
      - type: raw
        insert_at: parsing
        value: |-
          {"FIELD":0,"NAME":"storage.table.retained_bytes.avg","NAMESPACE":"snowflake","TAGS":{"env":"dev","schema":"STAGING","service":"snowflake","table":"STORAGE_LOG"},"TYPE":"gauge"}

    outputs:
      - extract_from: route_logs_by_type.gauge
        conditions:
          - type: vrl
            source: |-
              assert!(exists(.NAMESPACE))

  - name: "parsing -> remap_gauge_log_to_metric"
    inputs:
      - type: raw
        insert_at: parsing
        value: |-
          {"FIELD":0,"NAME":"storage.table.retained_bytes.avg","NAMESPACE":"snowflake","TAGS":{"env":"dev","schema":"STAGING","service":"snowflake","table":"STORAGE_LOG"},"TYPE":"gauge"}

    outputs:
      # {"name":"storage.table.retained_bytes.avg","namespace":"snowflake","tags":{"env":"dev","schema":"STAGING","service":"snowflake","table":"STORAGE_LOG"},"timestamp":"2022-10-05T21:09:16Z","kind":"absolute","gauge":{"value":0.0}}
      - extract_from: remap_gauge_log_to_metric
        conditions:
          - type: vrl
            source: |-
              assert!(exists(.name))
              assert!(exists(.namespace))
              assert!(exists(.tags))
              assert!(exists(.timestamp))
              assert!(exists(.kind))

  - name: "parsing -> remap_log_field_to_message"
    inputs:
      - type: raw
        insert_at: parsing
        value: |-
          {"FIELD":"test logging","NAMESPACE":"snowflake","TAGS":{"env":"dev","service":"snowflake"},"TYPE":"log"}

    outputs:
      # {"ddsource":"snowflake","ddtags":{"env":"dev"},"message":"test logging","service":"snowflake","timestamp":"2022-10-05T21:48:30.823011Z"}
      - extract_from: remap_log_field_to_message
        conditions:
          - type: vrl
            source: |-
              assert!(exists(.ddsource))
              assert!(exists(.ddtags))
              assert!(exists(.message))
              assert!(exists(.service))
              assert!(exists(.timestamp))
jszwedko commented 1 year ago

Hi @breathe !

Thanks for reporting this and apologies for the delay in review. You are correct that that page is misleading. We'll need to update that but you can also see https://github.com/vectordotdev/vector/blob/master/lib/codecs/tests/data/native_encoding/schema.cue for a more accurate description of the metric schema.