vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
18.05k stars 1.59k forks source link

Add static type checking to remap transform output #7328

Open samdphillips opened 3 years ago

samdphillips commented 3 years ago

Hi!

Current Vector Version

vector 0.13.1 (v0.13.1 x86_64-unknown-linux-gnu 2021-04-29)

Use-cases

A user can write a transformation and internally it may type check but the shape of the root object output from it may not be as expected.

Attempted Solutions

A user could write dynamic checks into the script. This has two downsides:

  1. Checks must be performed for every event/metric processed.
  2. Depending on the level of detail the checks could be more complex than the actual transformation.

Proposal

It would be nice to have an optional mechanism to specify the shape/schema/type of the root object after a remap transform has run, and for that specification to be statically checked. A mechanism like this would allow the user to express what they expect the output of a transform to look like and for vector/vrl to prove that the script is correct statically.

Possible syntax, which I just made up so I'm sure you can come up with something better:

[transforms.well_formed]
  type = "remap"
  inputs = ["source_data"]
  source = '''
  # do the transform ...
  '''
  output_type = '''
  { 
    "customer_id": int, 
    "action": { 
      "method": string, 
      "target": string 
    }, 
    "start_time": timestamp, 
    "end_time": timestamp,
    "duration": float
  }
  '''
binarylogic commented 3 years ago

Thanks @samdphillips, excellent suggestion. A couple of comments:

  1. I'm curious if we could solve this with our unit tests? We are planning to improve that area of Vector, but it is designed to assert output of components.
  2. This also touches to our plans to implement broader schema support. In other words, knowing the shape of data as soon as it enters Vector and using that information to type check VRL programs.