grafana / alloy

OpenTelemetry Collector distribution with programmable pipelines
https://grafana.com/oss/alloy
Apache License 2.0
1.25k stars 157 forks source link

Proposal: add function closures and data transformation functions to Alloy syntax #156

Open rfratto opened 1 year ago

rfratto commented 1 year ago

Background

The relabel rules from the discovery.relabel component tend to have a high learning curve, and where complicated rules (such as only including targets where two conditions A and B are both true) are harder than they need to be.

For example, I am personally using this relabel rule to set a label from a collection of other labels, copying the value from the first label which is set:

rule {
    // Try to identify a service name to eventually form the job label. We'll
    // prefer the first of the below labels, in descending order.
    source_labels = [
        "__meta_kubernetes_pod_label_k8s_app",
        "__meta_kubernetes_pod_label_app",
        "__meta_kubernetes_pod_label_name",
        "__helm_name__",
        "__meta_kubernetes_pod_controller_name",
        "__meta_kubernetes_pod_name",
    ]
    target_label = "__service__"

    // Our in-memory string will be something like A;B;C;D;E;F, where any of the
    // letters could be replaced with a label value or be empty if the label
    // value did not exist.
    //
    // We want to match for the very first sequence of non-semicolon characters
    // which is either prefaced by zero or more semicolons, and is followed by
    // zero or more semicolons before the rest of the string.
    regex = ";*([^;]+);*.*"
}

Grafana Agent Flow was originally designed to use expressions to allow for more flexibility in the configuration file. These expressions have the capability of mutating values using the Alloy syntax instead of Domain-Specific Languages like the rule block.

However, it is not possible to write any pure Alloy syntax today to transform a set of discovered targets. I propose adding function closures and a new set of data transformation functions (map/reduce/filter) to enable a new class of expressiveness in Grafana Agent Flow configs.

Adding support for function closures and data transformation functions would enable users to have more fine-grained control over data used in Flow pipelines, reducing the need for one-off data transformation components that Grafana Agent maintainers need to introduce to cover common use cases.

Example

discovery.kubernetes "pods" {
  role = "pod"
}

prometheus.scrape "default" {
  // Map over discovered targets and inject a new label. 
  targets = map(
    func(target) => target + { 
      extra_label = "test",
    },
    discovery.kubernetes.pods.targets,
  )

  forward_to = [...]
}

Combining with the coalesce function from the standard library, the relabel rule from above could be rewritten as:

map(
  func(target) => target + {
    // Try to identify a service name to eventually form the job label. We'll
    // prefer the first of the below labels, in descending order.
    __service__ = coalesce(
      target["__meta_kubernetes_pod_label_k8s_app"],
      target["__meta_kubernetes_pod_label_app"],
      target["__meta_kubernetes_pod_label_name"],
      target["__helm_name__"],
      target["__meta_kubernetes_pod_controller_name"],
      target["__meta_kubernetes_pod_name"],
    )
  },
  INPUT_TARGETS,
)

Proposal

Function closures

I propose that function closures be added to the Alloy syntax. Function closures create a function value, which captures the scope of the block surrounding it (typically a component).

I suggest the following syntax:

func(ARGUMENT_LIST) => EXPRESSION 

This syntax was chosen to make it as easy as possible to parse. While (ARGUMENT_LIST) => EXPRESSION would be less effort to write, it will make the parser more difficult to maintain.

In my prototype, I originally tried func(ARGUMENT_LIST) EXPRESSION for the function closure syntax. I found that this syntax was a little confusing as it wasn't clear from scanning the file how func(ARGUMENT_LIST) and EXPRESSION necessarily related. Adding the arrow => in makes it slightly clearer to new readers what is going on.

Function closures can be called within Alloy syntax using the normal function call syntax. An example of defining and calling the identity function is as follows:

(func(id) => id)(15) // returns 15 

For the scope of this proposal, it will not be possible to convert a function closure to a component argument, but can be added in the future. The future ability to pass a function closure as an argument will be important for allowing users to mutate a data stream (i.e., mutating labels from individual metrics to replace prometheus.relabel).

Data transformation functions

I propose a basic set of data transformation functions that accept closures as input:

More data transformation functions that accept closures may be added in the future.

Merging objects

For map to be useful for mutating targets, we need a way to merge two objects together. I see two ways to do this:

In both cases, the result of merging two objects would be a new object which combines the keys and values of both objects; a key defined on the second object takes precedence over the first.

Merging objects is particularly useful for taking a target and adding a new label:

target + {
  existing_label = "new_value",
  new_label      = "new_value", 
}

I suggest we define the + operator for objects as merging objects is likely going to be a common operation.

Prototype

A working prototype of this proposal is available at rfratto/agent on the river-map-reduce-filter branch.

Concerns

Function closures and data transformation functions add a new set of complexity into Alloy, and brings it further away from its HCL-inspired roots. This raises the learning curve of the language to be able to extract its full potential.

I believe this cost can be offset by introducing a way for users to use configs as components (called "modules"), where modules can be shared with other users via the internet (like GitHub). Modules would allow the complexity of advanced expressions to be abstracted away from beginners. Modules are a concept we've been discussing internally for a while, and a design doc should be available in the coming months.

Alternatives considered

As an alternative to creating function closures, components could embed an Alloy syntax interpreter and evaluate expressions. For example:

prometheus.relabel "default" {
  rule { 
     expression = object + { new_label = "new_value" } 
     action     = "map"
  }
}

However, this would be much less powerful, since it would limit using expressions to mutate values only to components which supported the concept. It would also be more work for components to manually evaluate expressions. For these reasons, I discarded this idea in favor of closures.

mattdurham commented 1 year ago

Do we want to avoid using the word map since its already used as a datatype?

rfratto commented 1 year ago

Do we want to avoid using the word map since its already used as a datatype?

We should definitely avoid using the word map twice, but I'm not sure if we should change the name of the function, since it might be used a lot and I don't want to make it too annoying to type. We might be able to find a new name for the datatype, though.

We can also namespace the function, if we really wanted: utils.map(...) (but this goes back to being annoying to type)

mattdurham commented 1 year ago

Agreed on preferring to rename the datatype map, dictionary might be better name?

Do we have a strong need for this at the moment, or is it more hand in hand with something like dynamic components?

rfratto commented 1 year ago

Do we have a strong need for this at the moment?

No, I'd say this is just a quality-of-life thing, since anything it's capable of could be supported by having custom components. It can probably wait until we're done with feature parity, or unless we have time to give to it.

rfratto commented 1 year ago

I think the biggest benefit this provides is reducing the learning curve needed to transform labels, since I find relabel rules to be particularly challenging. But again, it serves mainly as a QOL change since it's not really introducing any net-new functionality that we don't have now.

polyrain commented 1 year ago

Speaking as a third party, the addition of filter, map, and reduce would be a massive boon for bridging the gap from 'I have collected all these metrics series, and now I am interested in doing something useful with them based on criteria more complicated than the examples in the Grafana Guide'

This quickly arises when one starts to use the Grafana Agent in conjunction with Grafana Cloud, where the number of your unique time series is extremely relevant from a cost perspective. It is true if one is familiar with Prometheus relabelling rules (or is willing to invest some amount of time to come to grips with it), this is not a hurdle that is likely to be an issue; but for the use case of an organisation/small team who adds the agent to collect telemetry from their kubernetes cluster or what have you, the need to quickly roll out more advanced filtering on their metrics series being collected by the Agent before they're written is paramount.

Leaning on a more established syntax (all three of these functions are used extensively in JavaScript, and of course functional languages) allows much faster onboarding, iteration, and collaboration as the barrier for being able to grok what the relabelling rules the Agent is performing and why is made vastly more transparent (unless someone is just really good at reading regex!).

While there isn't a strong need for this sugar to be added to the syntax right now, I think that the proposal is exactly in-line with what Flow is all about; reducing the learning curve, and making it easier than ever for people to get involved and be productive with their monitoring from a wider array of backgrounds than those familiar with prom or borgmon :-)

mattdurham commented 1 year ago

Fantastic write up @polyrain much appreciated.

IMO I think the above is the best approach but is there a way to do it with a component?

discovery.kubernetes "pods" {
  role = "pod"
}

function.map "transform_label" {
   input = discovery.kubernetes.pods.targets,
   expression = (
     // Try to identify a service name to eventually form the job label. We'll
    // prefer the first of the below labels, in descending order.
    if target["__service__"] != "" then
        // noop
    else if target["__meta_kubernetes_pod_label_k8s_app"] != "" then
        target["__service__"] = target["__meta_kubernetes_pod_label_k8s_app"]
    else if target["__meta_kubernetes_pod_label_app"] != "" then 
        target["__service__"] = target["__meta_kubernetes_pod_label_app"]
    else if target["__meta_kubernetes_pod_label_name"] != "" then    
        target["__service__"] = target["__meta_kubernetes_pod_label_name"]
    else if target["__helm_name__"] != "" then 
        target["__service__"] = target["__helm_name__"] 
    else if target["__meta_kubernetes_pod_controller_name"] != "" then 
        target["__service__"] = target["__meta_kubernetes_pod_controller_name"]
    else if target["__meta_kubernetes_pod_name"] != "" then 
        target["__service__"] = target["__meta_kubernetes_pod_name"]
    else 
    end
    return target
    )
}

prometheus.scrape "default" {
  // Map over discovered targets and inject a new label. 
  targets = function.map.transform_label.value,
  forward_to = [...]
}

IMO one advantage would allow easier contributions for function like components since components are likely easier to write then adding support at the language level. Possibly easier testing?

rfratto commented 1 year ago

See the alternatives considered section where I discuss exactly that and give a argument for why language-level functionality is preferable :)

As an alternative to creating function closures, components could embed a River interpreter and evaluate expressions. For example:

prometheus.relabel "default" {
 rule { 
    expression = object + { new_label = "new_value" } 
    action     = "map"
 }
}

However, this would be much less powerful, since it would limit using expressions to mutate values only to components which supported the concept. It would also be more work for components to manually evaluate expressions. For these reasons, I discarded this idea in favor of closures.

tpaschalis commented 1 year ago

IMHO, on the language level, this will be a major step to making River generally useful, and is in-line with Flow's philosophy of giving tools to power users so they can use components in novel ways, so that's an early +1 for me.

This raises the learning curve of the language to be able to extract its full potential.

As long as we're not having components that require users learn about and use the new feature, I don't see an issue.

Finally, as is tradition with replies to proposals, here's a bit of nitpicking around syntax even though it's too early and I haven't even checked out the prototype branch 😅

Would we think about 'scoping' these functions? Since curly braces have already some other significance within River, we could use the end keyword similar to how Elixir does which is quite readable.

mattdurham commented 1 year ago

I think it’s slightly different than your alternative example. Normal components would not need to support the concept, only one (possible other function components) component would. In the above case the function.map component knows how to evaluate the closure and the scraper requires no changes since the value is an array of targets like it expects currently.

rfratto commented 1 year ago

We reviewed this in the community call. The consensus is trending towards yes, but we need more time to investigate the impact and whether this has any hidden costs compared to writing new components (or extending existing components) to cover use cases that map/reduce/filter would've covered.

thampiotr commented 1 year ago

My main concern is the following: while the functionality proposed immediately here and in the prototype indeed looks manageable, in the long term this can become very complex and will be expensive to maintain.

In order to fully realise the idea of River expressions capable of processing arbitrary data and not just simple conditionals adding labels to targets, we may end up building a lot of features that will result in something similar to other existing languages, but it will be slightly different and with its own quirks - resulting in steep learning curve and high maintenance cost. And getting all the features may be a bumpy road where we may need to go through some deprecation/breaking changes cycles when we don't get things right.

Here are some questions to illustrate what I mean (I don't expect elaborate answers, these are provided to illustrate what we may end up adding in the long run):

  1. There's a proposal to add support for merging objects with + or object_merge. How can I delete fields from an object?
  2. Similarly, is there support for merging a nested property inside a closure, e.g. what if I have {foo: {bar: baz}} and I want to make it {foo: {bar: baz, boo: yah}}. Since we call it merging, sounds like we should support deeper levels too.
  3. Can I work with arrays inside objects? e.g. if I want to append to an existing array inside {foo: [bar, baz]}
  4. Can I have a regular expression with matching groups that I can use to extract data from a field?
  5. Can I refer to stuff from outside the closure? e.g. local.file.content? Do we handle circular dependencies if I create one?
  6. Can I call another map/filter/reduce from inside the closure, with another nested closure? e.g. I process an array of arrrays.
  7. Can I work with other types like numbers? for example, say I want to change the unit from seconds to milliseconds, can I convert {duration: 1} into {duration: 1000} using closures?
  8. If I have the same closure used in many places, maybe I would like to give it a name and refer to it by name... local my_function = {...};.
  9. Can I use closures to e.g. map `['hostname1', 'hostname2', ...] into multiple component declarations, e.g. MySQL exporters?

It's a list that could grow further. I think before DIY, we should evaluate some off-the-shelf options. For example, Redis didn't create their own scripting language, but used Lua instead. I can see some inspiration in the proposal from Jsonnet, so perhaps go-jsonnet could be considered too.

rfratto commented 1 year ago

@thampiotr What part of the complexity are you worried about in particular? Are you worried about the complexity for the user that all of those use cases will be possible? Or are you worried about how complicated it will be to write functions which cover all of those use cases?

resulting in steep learning curve

I'm not sure I understand this concern in the context of the overall message; if these use cases are valid, but you're interested in exploring off-the-shelf options for supporting them, then the learning curve is still steep for the user regardless of how we expect them to interact with it.

Are we talking about a different learning curve?

It's a list that could grow further. I think before DIY, we should evaluate some off-the-shelf options. For example, Redis didn't create their own scripting language, but used Lua instead. I can see some inspiration in the proposal from Jsonnet, so perhaps go-jsonnet could be considered too.

I think we would need very strong justification for why we would expect the user to mix two languages for doing specific things (e.g., River for structuring components, but Lua for transforming data). "We didn't want to maintain extra code, so we made you use two separate things" doesn't seem like good justification to me.

I would prefer providing a consistent user experience for Flow. Should we be discussing whether it's desirable to move off of River altogether, or a way to support all of the use cases you mentioned above without extending River?

thampiotr commented 1 year ago

What part of the complexity are you worried about in particular?

The complexity that I think is worth avoiding is having to custom-build, document and maintain a rather large set of features if there is (potentially) something else that we could use. And by a "large set of features" I refer to what it may become over time as we extend it - not the initial few functions proposed here.

It would be a non-trivial amount of effort to build this so IMO it's good to understand what are the other options and why we make a decision that we make. Documenting our decision against other options will also help answer any future questions we may get from the users and contributors.

I think we would need very strong justification for why we would expect the user to mix two languages for doing specific things

I agree that having two languages in one file is not great except for some very short scripts. Maybe we could have e.g. process.lua component for transforming generic data, where you could put your script in a separate file. You would get common IDE support, docs and other goodness that comes with using something off-the-shelf. Your script can be tested.

"We didn't want to maintain extra code, so we made you use two separate things"

Just wanted to note that lower maintenance burden means more time to build useful features for our users, so it's not only about benefiting the maintainers, but also about providing more value to the users.

I would prefer providing a consistent user experience for Flow.

I do enjoy using River and I think it's not hard to learn as a configuration syntax because it's quite simple right now. So on one hand I'd like to keep it this way and allow for more advanced scripting somewhere else, but on the other hand I see the argument that it would "flow" nicer if I can write scripts in River.

Let's say that we have both: River closures and process.lua (or any embedded scripting language). If I had something very simple, I would be happy to write it in a River closure, but if I have some more complex processing rules, I would enjoy the ability to put them in a file, edit in an IDE and unit-test them.

thampiotr commented 1 year ago

Maybe we can keep the native River scripting feature set somewhat lean and when/if we start receiving feature requests that would make it too complex, we can instead provide a process.lua or other embedded scripting language component.

Also worth noting that a scripting component would only be able to process simple data passed by value, while native River can work with other River concepts.