open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.76k stars 2.19k forks source link

Proposal for language additions to OTTL #30800

Open jsuereth opened 5 months ago

jsuereth commented 5 months ago

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

We were attempting to use OTTL to transform Google's structured logging format in GKE (see: https://cloud.google.com/logging/docs/structured-logging) to the current example data model in OpenTelemetry (see: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/data-model-appendix.md#google-cloud-logging).

Effectively, we're trying to parse a JSON log body and extract components into OTLP equivalents:

{
  "severity":"ERROR",
  "message":"There was an error in the application.",
  "httpRequest":{
    "requestMethod":"GET"
  },
  "times":"2020-10-12T07:20:50.52Z",
  "logging.googleapis.com/insertId":"42",
  "logging.googleapis.com/labels":{
    "user_label_1":"value_1",
    "user_label_2":"value_2"
  },
  "logging.googleapis.com/operation":{
    "id":"get_data",
    "producer":"github.com/MyProject/MyApplication",
     "first":"true"
  },
  "logging.googleapis.com/sourceLocation":{
    "file":"get_data.py",
    "line":"142",
    "function":"getData"
  },
  "logging.googleapis.com/spanId":"000000000000004a",
  "logging.googleapis.com/trace":"projects/my-projectid/traces/06796866738c859f2f19b7cfb3214824",
  "logging.googleapis.com/trace_sampled":false
}

In doing so we noticed a lot of friction in OTTL and duplicate expressions.

Here's a "simplified" (i.e. still missing some if/where statements, and new required built-in functions for span processing) version:

context: log
statements:
- set(body, ParseJSON(body["message"])) where (body != nil and body["message"] != nil)
- merge_maps(attributes, body["logging.googleapis.com/labels"], "upsert") where body["logging.googleapis.com/labels"] != nil
- delete_key(body, "logging.googleapis.com/labels") where (body != nil and body["logging.googleapis.com/labels"] != nil)
- delete_key(cache, "__field_0") where (cache != nil and cache["__field_0"] != nil)
- set(cache["__field_0"], body["logging.googleapis.com/httpRequest"]) where (body != nil and body["logging.googleapis.com/httpRequest"] != nil)
- delete_key(body, "logging.googleapis.com/httpRequest") where (body != nil and body["logging.googleapis.com/httpRequest"] != nil)
- set(cache["value"], cache["__field_0"])
- set(attributes["gcp.http_request"], cache["value"]) where (cache != nil and cache["value"] != nil)
- delete_key(cache, "__field_0") where (cache != nil and cache["__field_0"] != nil)
- set(cache["__field_0"], body["logging.googleapis.com/logName"]) where (body != nil and body["logging.googleapis.com/logName"] != nil)
- delete_key(body, "logging.googleapis.com/logName") where (body != nil and body["logging.googleapis.com/logName"] != nil)
- set(cache["value"], cache["__field_0"])
- set(attributes["gcp.log_name"], cache["value"]) where (cache != nil and cache["value"] != nil)
- delete_key(cache, "__field_0") where (cache != nil and cache["__field_0"] != nil)
- set(cache["__field_0"], body["logging.googleapis.com/severity"]) where (body != nil and body["logging.googleapis.com/severity"] != nil)
- delete_key(body, "logging.googleapis.com/severity") where (body != nil and body["logging.googleapis.com/severity"] != nil)
- set(cache["value"], cache["__field_0"])
- set(severity_text, cache["value"]) where (cache != nil and cache["value"] != nil)
- delete_key(cache, "__field_0") where (cache != nil and cache["__field_0"] != nil)
- set(cache["__field_0"], body["logging.googleapis.com/sourceLocation"]) where (body != nil and body["logging.googleapis.com/sourceLocation"] != nil)
- delete_key(body, "logging.googleapis.com/sourceLocation") where (body != nil and body["logging.googleapis.com/sourceLocation"] != nil)
- set(cache["value"], cache["__field_0"])
- set(attributes["gcp.source_location"], cache["value"]) where (cache != nil and cache["value"] != nil)
- delete_key(cache, "__field_0") where (cache != nil and cache["__field_0"] != nil)
- set(cache["__field_0"], body["logging.googleapis.com/spanId"]) where (body != nil and body["logging.googleapis.com/spanId"] != nil)
- delete_key(body, "logging.googleapis.com/spanId") where (body != nil and body["logging.googleapis.com/spanId"] != nil)
- set(cache["value"], cache["__field_0"])
- set(span_id, cache["value"]) where (cache != nil and cache["value"] != nil)
- delete_key(cache, "__field_0") where (cache != nil and cache["__field_0"] != nil)
- set(cache["__field_0"], body["logging.googleapis.com/trace"]) where (body != nil and body["logging.googleapis.com/trace"] != nil)
- delete_key(body, "logging.googleapis.com/trace") where (body != nil and body["logging.googleapis.com/trace"] != nil)
- set(cache["value"], cache["__field_0"])
- set(trace_id, cache["value"]) where (cache != nil and cache["value"] != nil)

Describe the solution you'd like

We'd like to propose a new expression-focused syntax for OTTL that would allow the previous OTTL to look like this:

on log
when log.body is aStringMessage(parsedJson(json))
yield log with {
  attributes: attributes with json["logging.googleapis.com/labels"] with {
    "gcp.log_name": json["logging.googleapis.com/logName"]
  },
  spanID: StringToSpanID(json["logging.googleapis.com/spanId"]),
  traceID: StringToTraceID(json["logging.googleapis.com/trace"]),
  severity_text: json["logging.googleapis.com/severity"],
  body: json with {
    "logging.googleapis.com/labels": nil,
    "logging.googleapis.com/logName": nil,
    "logging.googleapis.com/severity": nil,
    "logging.googleapis.com/sourceLocation": nil,
    "logging.googleapis.com/spanId": nil,
    "logging.googleapis.com/trace": nil,
    "logging.googleapis.com/labels": nil,
  },
}

At a high level we propose the following:

Describe alternatives you've considered

We investigated leveraging CEL or LUA for this purpose.

Unfortunately there are a few shortcomings we think this proposal would alleviate:

Additional context

I have a fully implemented prototype "trans-piler" which can take this new syntax and backport OTTL statements from it. This prototype includes grammar suggestions and rationale.

I would like to consider whether OTTL should expand its expression prior to #28892.

github-actions[bot] commented 5 months ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

TylerHelmuth commented 5 months ago

@jsuereth thanks for taking the time to dive into OTTL. There is a lot to comprehend here and it is going to take some time for me to ingest everything. I think this would be a very good topic for a Collector SIG meeting.

TylerHelmuth commented 5 months ago

@jsuereth since this isnt a PR yet I'm gonna supply some feedback from https://github.com/jsuereth/ottl-proposal/blob/main/PROPOSAL.md here

feedback on the complaints section

inconsistent and surprising syntax limitations (parentheses cannot be safely added around any expression, Boolean values can't be used as conditions without appending == true, time values can't be stored in cache fields, etc)

I believe all boolean or mathematical expressions can have () safely added as long as the result is still a boolean/mathematical expressiong. Do you have an example where this is not working.

Boolean values can now be used in conditions directly: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/20911. Do you have an example where this is not working?

See https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/26108 for the cache and time values.

Isn't set(attributes["key"], nil) the same as delete_key(attributes, "key")

Currently set is no-op when the value is nil: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/ottl/ottlfuncs/func_set.go. I agree that isn't clear enough.

Inconsistency in identifier lookups. trace_id.string and trace_id["string"] don't refer to the same thing even though body.string and body["string"] do.

We've recently added a feature so that indexing fields that cannot be index throws an error on startup. body.string and body["string"] don't refer to the same thing: the former returns the body as a string and the latter requires the body to be a map and returns the value in the map at "string". I'll argue our path's have been consistent with the use of . to separate segments and what can and cannot be indexed.

feedback on the proposal section

TylerHelmuth commented 5 months ago

For the comparison to existing OTTL syntax, I'd like to submit my attempt at parsing a JSON log to specification. I probably misinterpreted some stuff, so please let me know if I've done something wrong:

Input:

{
  "severity":"ERROR",
  "message":"There was an error in the application.",
  "httpRequest":{
    "requestMethod":"GET"
  },
  "times":"2020-10-12T07:20:50.52Z",
  "logging.googleapis.com/insertId":"42",
  "logging.googleapis.com/labels":{
    "user_label_1":"value_1",
    "user_label_2":"value_2"
  },
  "logging.googleapis.com/operation":{
    "id":"get_data",
    "producer":"github.com/MyProject/MyApplication",
     "first":"true"
  },
  "logging.googleapis.com/sourceLocation":{
    "file":"get_data.py",
    "line":"142",
    "function":"getData"
  },
  "logging.googleapis.com/spanId":"000000000000004a",
  "logging.googleapis.com/trace":"projects/my-projectid/traces/06796866738c859f2f19b7cfb3214824",
  "logging.googleapis.com/trace_sampled":false
}

transformprocessor config:

  transform:
    error_mode: ignore
    log_statements:
      - context: log
        statements:
          - merge_maps(cache, ParseJSON(body), "upsert") where IsMatch(body, "^\\{")

          - set(time, Time(cache["times"], "%Y-%m-%dT%H:%M:%S.%f%z")) where cache["times"] != nil

          - set(severity_number, SEVERITY_NUMBER_DEBUG) where cache["severity"] == "DEBUG"
          - set(severity_number, SEVERITY_NUMBER_INFO) where cache["severity"] == "INFO"
          - set(severity_number, SEVERITY_NUMBER_WARN) where cache["severity"] == "WARNING"
          - set(severity_number, SEVERITY_NUMBER_ERROR) where cache["severity"] == "ERROR"
          - set(severity_number, SEVERITY_NUMBER_FATAL) where cache["severity"] == "CRITICAL"

          - set(attributes["gcp.http_request"], cache["httpRequest"]) where cache["http_request"] != nil

          - set(span_id.string, cache["logging.googleapis.com/spanId"]) where cache["logging.googleapis.com/spanId"] != nil

          - replace_pattern(cache["logging.googleapis.com/trace"], "projects/.*/traces/([\\w\\d]+)", "$$1") where cache["logging.googleapis.com/trace"] != nil
          - set(trace_id.string, cache["logging.googleapis.com/trace"]) where cache["logging.googleapis.com/trace"] != nil

          - delete_key(cache, "times")
          - delete_key(cache, "severity")
          - delete_key(cache, "httpRequest")
          - delete_key(cache, "logging.googleapis.com/trace")
          - delete_key(cache, "logging.googleapis.com/spanId")
          - flatten(cache, prefix="gcp")
          - merge_maps(attributes, cache, "insert")

output:

Resource SchemaURL: 
ScopeLogs #0
ScopeLogs SchemaURL: 
InstrumentationScope  
LogRecord #0
ObservedTimestamp: 2024-01-26 23:21:16.374373 +0000 UTC
Timestamp: 2020-10-12 07:20:50.52 +0000 UTC
SeverityText: 
SeverityNumber: Error(17)
Body: Str({"severity":"ERROR","message":"There was an error in the application.","httpRequest":{    "requestMethod":"GET"},"times":"2020-10-12T07:20:50.52Z","logging.googleapis.com/insertId":"42","logging.googleapis.com/labels":{    "user_label_1":"value_1",    "user_label_2":"value_2"},"logging.googleapis.com/operation":{    "id":"get_data",    "producer":"github.com/MyProject/MyApplication",    "first":"true"},"logging.googleapis.com/sourceLocation":{    "file":"get_data.py",    "line":"142",    "function":"getData"},"logging.googleapis.com/spanId":"000000000000004a","logging.googleapis.com/trace":"projects/my-projectid/traces/06796866738c859f2f19b7cfb3214824","logging.googleapis.com/trace_sampled":false})
Attributes:
     -> logging.googleapis.com/spanId: Str(000000000000004a)
     -> times: Str(2020-10-12T07:20:50.52Z)
     -> logging.googleapis.com/sourceLocation: Map({"file":"get_data.py","function":"getData","line":"142"})
     -> httpRequest: Map({"requestMethod":"GET"})
     -> logging.googleapis.com/insertId: Str(42)
     -> logging.googleapis.com/labels: Map({"user_label_1":"value_1","user_label_2":"value_2"})
     -> message: Str(There was an error in the application.)
     -> logging.googleapis.com/operation: Map({"first":"true","id":"get_data","producer":"github.com/MyProject/MyApplication"})
     -> logging.googleapis.com/trace_sampled: Bool(false)
     -> log.file.name: Str(gcp.json)
     -> severity: Str(ERROR)
     -> logging.googleapis.com/trace: Str(projects/my-projectid/traces/06796866738c859f2f19b7cfb3214824)
     -> gcp.logging.googleapis.com/operation.id: Str(get_data)
     -> gcp.logging.googleapis.com/operation.producer: Str(github.com/MyProject/MyApplication)
     -> gcp.logging.googleapis.com/operation.first: Str(true)
     -> gcp.message: Str(There was an error in the application.)
     -> gcp.logging.googleapis.com/insertId: Str(42)
     -> gcp.logging.googleapis.com/labels.user_label_1: Str(value_1)
     -> gcp.logging.googleapis.com/labels.user_label_2: Str(value_2)
     -> gcp.logging.googleapis.com/trace_sampled: Bool(false)
     -> gcp.logging.googleapis.com/sourceLocation.file: Str(get_data.py)
     -> gcp.logging.googleapis.com/sourceLocation.line: Str(142)
     -> gcp.logging.googleapis.com/sourceLocation.function: Str(getData)
Trace ID: 06796866738c859f2f19b7cfb3214824
Span ID: 000000000000004a
Flags: 0
        {"kind": "exporter", "data_type": "logs", "name": "debug"}

I post this only for a more fair syntax comparison because I do not believe the one in the proposal is optimized.

TylerHelmuth commented 5 months ago

I'll also admit that my knee-jerk reaction is to be defensive haha I really appreciate the time you've spent on this proposal and your desire to improve OTTl. I promise to keep an open mind!

jsuereth commented 5 months ago

First, want to thank you for providing more optimal code for the use case! What I posted was our naive attempt with our current understanding of OTTL. It seems good that things can be simplified and I think some of our points were perhaps bugs that are now fixed as opposed to assumptions on syntax.

That said, I still think expanding available "expression" syntax and providing features like pattern matching or structural literals would be ideal to the types of programs you expect in OTTL.

I'm going to answer your questions over the next few days as I can and in separate comments, to hopefully have a good conversation and keep different points and features in separate conversation.

My proposal is here to help. While I stand behind all of it, we can try different pieces initially. I'm mostly looking to make sure when OTTL is stable it is scalable.

This isn't meant to be an attack on the current state, more a vision of where we can grow, and grow (hopefully) without breaking existing users.

jsuereth commented 5 months ago

At a brief glance I don't comprehend yet what these 2 expressions do, can you go into more detail?

attributes: attributes with json["logging.googleapis.com/labels"] with {
"gcp.log_name": json["logging.googleapis.com/logName"]
},
body: json with {
"logging.googleapis.com/labels": nil,
"logging.googleapis.com/logName": nil,
"logging.googleapis.com/severity": nil,
"logging.googleapis.com/sourceLocation": nil,
"logging.googleapis.com/spanId": nil,
"logging.googleapis.com/trace": nil,
"logging.googleapis.com/labels": nil,
},

This is my idea of a merge operation. The second is meant to be a mechanism of deleting multiple values from the JSON body. x with y is a merge maps method with insert. It takes the right and overrides the left with the value. The nice part of having this be an expression that returns a value is that you can chain it as I do in the first example.

The first examples does the following:

The ability to chain the merges keeps the syntax brief but also visually you can see the key-value pairs being set.

jsuereth commented 5 months ago

I don't understand the pattern matching section yet. If the goal is to have a common where clause there is [processor/transform] Allow common where clause #27830, although I recognize that this doesn't solve the problem directly in OTTL

Pattern matching is designed to do two things:

The goal in this addition is not just sharing the conditional but also gaining a term (name? Reference?) You can use that has some desired type or feature.

Example from scala:

case class Data(x: Int, y: String)

val x: Any = ...
x match {
  case Data(_, name) => ...here I can use "name" as a string...
}

Example from rust:

enum Option<T> {
  None,
  Some(T)
}

match opt_value {
  None => handle_none(),
  Some(value) => handle_value(value),
}

The idea here is that you can chain these patterns too to dramatically simplify both checking a condition and adding a new term.

For OTTL I imagine we'd use this to verify log body AnyVal have specific fields of specific types and give names to those fields we can use in the program.

On log
When body like { some_field: String(name_ill_use) }
set(attributes.new_label, name_ill_use)

Here the user selected name_ill_use. The interpreter will verify some_field exists on body and that it contains a string value. That can be done in the guard/Boolean where clause.

Hope that helps! Let me know if you need more details or examples from other languages.

TylerHelmuth commented 5 months ago

My proposal is here to help. While I stand behind all of it, we can try different pieces initially. I'm mostly looking to make sure when OTTL is stable it is scalable. This isn't meant to be an attack on the current state, more a vision of where we can grow, and grow (hopefully) without breaking existing users.

Absolutely, I highlighting my own shortcomings haha we share the goal of making OTTL scalable and stable. Thanks again for thinking about this!

TylerHelmuth commented 5 months ago

@jsuereth in the examples where is the variable json coming from?

TylerHelmuth commented 5 months ago

@jsuereth I'd be really interested in seeing the examples from https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/transformprocessor#examples rewritten with this new syntax. I am curious how non-log/non-json statements look like and how they'll look in a yaml config.

jsuereth commented 5 months ago

@jsuereth in the examples where is the variable json coming from?

This the pattern matching component:

on log
when log.body is aStringMessage(parsedJson(json))
yield ...

aStringMessage is a function that takes in an AnyVal (log body) and returns Option<String>. parsedJson is a function that takes in a String and returns Option<AnyVal> (i.e. parsed JSON).

The name json is the name the user has given this result. As you can see, this is an example of chaining "extractors" in the pattern matching.

TylerHelmuth commented 5 months ago

Oh I see now. I was interpreting aStringMessage(parsedJson(json)) as a chain of function calls with json as the first input to parsedJson and then into aStringMessage, like if it was Go/Python/Java/.NET. Is the syntax you're proposing present in any other programming languages?

crobert-1 commented 5 months ago

I'm going to remove needs triage since this has been acknowledged and is being discussed.

jsuereth commented 5 months ago

Oh I see now. I was interpreting aStringMessage(parsedJson(json)) as a chain of function calls with json as the first input to parsedJson and then into aStringMessage, like if it was Go/Python/Java/.NET. Is the syntax you're proposing present in any other programming languages?

It is a chain of function calls, Scala is the language I'm borrowing from heavily here. Rust also has pattern matching with extraction, but doesn't expand to allow custom functions.

jsuereth commented 5 months ago

@jsuereth I'd be really interested in seeing the examples from https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/transformprocessor#examples rewritten with this new syntax. I am curious how non-log/non-json statements look like and how they'll look in a yaml config.

Tried to provide what examples might look like with proposed language features: https://github.com/jsuereth/ottl-proposal/blob/main/examples.md

Note: If you think it'd be worthwhile for me to tailor language feature proposals into components and what things might look like incrementally adding one feature at a time, let me know. E.g. there are examples I felt pattern matching helps the most, others list-comprehension.

While I do think all the language features would benefit OTTL, I'm happy to entertain a subset of this to contribute.

TylerHelmuth commented 5 months ago

It is a chain of function calls, Scala is the language I'm borrowing from heavily here. Rust also has pattern matching with extraction, but doesn't expand to allow custom functions.

I am confused by the json entry in aStringMessage(parsedJson(json)). Is json and input or an output of that statement?

jsuereth commented 5 months ago

I am confused by the json entry in aStringMessage(parsedJson(json)). Is json and input or an output of that statement?

In the context of pattern matching, json is effectively an output.

We have a set of functions/methods called "extractors" which take an input and return Option[...]. The return value then is what's used.

So, for example, if i have a symbol log.body which is a String and a symbol parseJson which is a func (String) Option[Json], then in pattern matching I could use that function as follows:

match log.body {
   case parseJson(json) => // here I can use the term `json` as a `Json` type
   case value => // here I can use the term `value` as a string 
}

here log.body is the input to the "extractor" functions. The return value is what's in the () part of the function call, allowing you to chain MULTIPLE extractors.

Note: The simplest form of pattern matching just lets you extract structure arguments by name, but without any of these fancy function things. I think given JSON + Logging and the need to deal with AnyVal all the time, this feature can be useful. Particularly if adding new "patterns" is as easy as a function with optional return.

github-actions[bot] commented 3 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

jsuereth commented 2 months ago

I haven't had much time the past few months to pursue this (due to heavy investment in codegen / semantic-conventions / weaver).

I would like to repursue this with an aim and reducing the ticket/issue count on OTTL with targeted features (or just general OTTL language specification improvement).

@TylerHelmuth If you're amenable, maybe we can set up some time to chat? At a minimum I think given #32080 - The most important thing from this proposal would be to specify what . means formally and then leverage this to lift certain information about log transformations (e.g. understanding when new resources must be synthesized).

evan-bradley commented 2 months ago

@jsuereth If you are able to attend, this would be great to discuss at a Collector SIG meeting. We have them every Wednesday at 12:00 EDT / 09:00 PDT. Here's a link to the meeting notes with a link to the Zoom room.

If this time doesn't work for you, I agree a call would be a good way to discuss our options here.

jsuereth commented 1 month ago

@evan-bradley That time, unfortunately, conflicts with semconv tooling WG, so I'm unable to attend.

jsuereth commented 2 weeks ago

We discussed briefly at OTEL community days. I'll follow up with y'all after my summer vacation/travel have passed.

I think we can be highly focused on a few points of the proposal to make progress against key friction points.