vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.56k stars 1.54k forks source link

Vector unit tests do not take log namespacing (and other global settings?) into account #21342

Open byronwolfman opened 4 days ago

byronwolfman commented 4 days ago

A note for the community

Problem

I am quoting this excellent comment from @pront which describes how log namespacing affects the type of . for transform components, and the implications on the object() function:

  • when you disable log namespacing, the . type is always an object and compiler knows that

    • so the first VRL statement becomes infallible: original = object(.)
  • when you enable log namespacing the . type can be anything

    • so the first VRL statement becomes fallible and needs error handling: original = object!(.)
  • finally the Vector Unit test doesn't seem to change based on the log_namespace setting and it defaults to first behavior (infallible)

Using the example configuration below, vector test will pass all unit tests, but vector validate will fail on the object() function because it knows that log namespacing is enabled, and that . can be of type anything.

Alternatively, you can make the following change to the config:

-      original = object(.)
+      original = object!(.)

And now vector validate will succeed because this is correct/required, but vector test will fail because it is unaware that log namespacing is enabled, and it thinks that . is already an object.

Configuration

data_dir: /tmp
schema:
  log_namespace: true

sources:
  demo:
    type: demo_logs
    format: json

transforms:
  some_transform:
    type: remap
    inputs:
      - demo
    source: |
      original = object(.)

      # This doesn't really do anything
      # But it suppresses warnings for unused vars
      if original != null {
        . = original
      }

sinks:
  console:
    type: console
    inputs:
      - some_transform
    encoding:
      codec: json 

tests:
  - name: test some_transform
    inputs:
      - insert_at: some_transform
        type: vrl
        source: |
          .message = "some message"
    outputs:
      - extract_from: some_transform
        conditions:
          - .message == "some message"

Version

vector 0.41.1 (x86_64-apple-darwin 745babd 2024-09-11 14:55:36.802851761)

Debug Output

No response

Example Data

No response

Additional Context

With object() as per original config:

Partial vector validate:

√ Loaded ["vector.yaml"]

Component errors
----------------
x Transform "some_transform":
error[E103]: unhandled fallible assignment
  β”Œβ”€ :1:12
  β”‚
1 β”‚ original = object(.)
  β”‚ ---------- ^^^^^^^^^ this expression is fallible because at least one argument's type cannot be verified to be valid
  β”‚ β”‚
  β”‚ or change this to an infallible assignment:
  β”‚ original, err = object(.)
  β”‚

Partial vector test:

Running tests
test test some_transform ... passed

With object!():

Partial vector validate:

√ Loaded ["vector.yaml"]
√ Component configuration
√ Health check "console"
-------------------------
                Validated

Partial vector test:

Running tests
2024-09-23T17:43:08.981142Z ERROR vector::unit_test: Failed to execute tests:
Failed to build test 'test some_transform':
  Transform "some_transform":
  error[E620]: can't abort infallible function
    β”Œβ”€ :1:12
    β”‚
  1 β”‚ original = object!(.)
    β”‚            ^^^^^^- remove this abort-instruction
    β”‚            β”‚
    β”‚            this function can't fail
    β”‚
    = see documentation about error handling at https://errors.vrl.dev/#handling
    = see language documentation at https://vrl.dev
    = try your code in the VRL REPL, learn more at https://vrl.dev/examples

References

s-at-ik commented 1 day ago

I've stumbled upon this exact behaviour yesterday and was really puzzled by it. Glad to have got an explanation.

Note that you can work around this bug by assigning a key to the . variable first. In the given example, it would look like this:

transforms:
  some_transform:
    type: remap
    inputs:
      - demo
    source: |
      .foo = "bar"
      original = object(.)

This will satisfy both validate and test, but feels more like a hack than a real solution.