vectordotdev / vrl

Vector Remap Language
Mozilla Public License 2.0
134 stars 64 forks source link

fields extracted using parse_regex imply incorrect type definitions when parse_regex fails #116

Open jerome-kleinen-kbc-be opened 3 years ago

jerome-kleinen-kbc-be commented 3 years ago

Vector Version

vector 0.13.1 (v0.13.1 x86_64-unknown-linux-gnu 2021-04-29)

Vector Configuration File

[sources.generator]
  type = "generator"
  format = "syslog"

[transforms.remap]
  type = "remap"
  inputs = ["generator"]
  source = '''
  . |= parse_regex(.message, r'^<\d+>(?P<a>\S+') ?? {}
  .a = downcase(string(.a) ?? "")
  '''

[sinks.console]
  type = "console"
  inputs = ["remap"]
  encoding = "json"

Debug Output

error[E651]: unnecessary error coalescing operation
  ┌─ :2:17
  │
2 │   .a = downcase(string(.a) ?? "")
  │                 ^^^^^^^^^^ -- -- this expression never resolves
  │                 │          │
  │                 │          remove this error coalescing operation
  │                 this expression can't fail
  │
  = see language documentation at https://vrl.dev

error[E110]: invalid argument type
  ┌─ :2:17
  │
2 │   .a = downcase(string(.a) ?? "")
  │                 ^^^^^^^^^^^^^^^^
  │                 │
  │                 this expression resolves to the exact type "boolean"
  │                 but the parameter "value" expects the exact type "string"

Expected Behavior

Config should work as expected.

Actual Behavior

Config returns an error when using vector validate or running the config.

Example Data

Generated data.

Additional Context

Discussed this with @JeanMertz and he figured out that vector is making a wrong compile time assumption of the fields extracted using regex capture groups when parse_regex errors are suppressed using the " ?? {} " notation used above. A potential workaround is:

[sources.generator]
  type = "generator"
  format = "syslog"

[transforms.remap]
  type = "remap"
  inputs = ["generator"]
  source = '''
  . |= parse_regex(.message, r'^<\d+>(?P<a>\S+') ?? {"a": null}
  .a = downcase(string(.a) ?? "")
  '''

[sinks.console]
  type = "console"
  inputs = ["remap"]
  encoding = "json"

References

Discussed on Discord in the support channel on 11/05/2021 around 12:00 CET.

ansel1 commented 3 years ago

I think something similar happens with parse_json.

[sources.generator]
  type = "generator"
  format = "syslog"

[transforms.remap]
  type = "remap"
  inputs = ["generator"]
  source = '''
  structured, err = parse_json(.message)
  if err == null {
    . = merge(structured, .)
  }
  '''

[sinks.console]
  type = "console"
  inputs = ["remap"]
  encoding = "json"

results in

    ┌─ :10:9
    │
 10 │     . = merge(structured, .)
    │     --- ^^^^^^^^^^^^^^^^^^^^
    │     │   │
    │     │   this expression is fallible
    │     │   update the expression to be infallible
    │     or change this to an infallible assignment:
    │     ., err = merge(structured, .)
    │
    = see documentation about error handling at https://errors.vrl.dev/#handling
    = learn more about error code 103 at https://errors.vrl.dev/103
    = see language documentation at https://vrl.dev

workaround is

. = merge(structured, .) ?? .
spencergilbert commented 3 years ago

I think something similar happens with parse_json.

[sources.generator]
  type = "generator"
  format = "syslog"

[transforms.remap]
  type = "remap"
  inputs = ["generator"]
  source = '''
  structured, err = parse_json(.message)
  if err == null {
    . = merge(structured, .)
  }
  '''

[sinks.console]
  type = "console"
  inputs = ["remap"]
  encoding = "json"

results in

    ┌─ :10:9
    │
 10 │     . = merge(structured, .)
    │     --- ^^^^^^^^^^^^^^^^^^^^
    │     │   │
    │     │   this expression is fallible
    │     │   update the expression to be infallible
    │     or change this to an infallible assignment:
    │     ., err = merge(structured, .)
    │
    = see documentation about error handling at https://errors.vrl.dev/#handling
    = learn more about error code 103 at https://errors.vrl.dev/103
    = see language documentation at https://vrl.dev

workaround is

. = merge(structured, .) ?? .

Actually this isn't quite accurate - parse_json doesn't actually guarantee an object returned when the parsing succeeds. For example:

$ parse_json!("true")
true

$ parse_json!("1")
1

I'd suggest the following for your workaround:

. = merge(object!(structured), .)
jszwedko commented 3 years ago

@StephenWakely @JeanMertz I'm curious if you have thoughts on this one. Would it be an easy thing to fix?

StephenWakely commented 3 years ago

The typedef here:

https://github.com/vectordotdev/vector/blob/master/lib/vrl/compiler/src/expression/op.rs#L127

  Err if rhs_def.is_infallible() => lhs_def.merge(rhs_def).infallible(),

needs to do more than merge. It needs to take all the fields returned in lhs_def and append | null to the definition of those fields and then merge in rhs_def.

I don't anticipate it to be too hard, but it could be a little fiddly.

jszwedko commented 3 years ago

Ah, yeah, I see, thanks @StephenWakely ! I added it to the 0.19.0 milestone. I'll try to keep an eye on it, but if you get some downtime and feel like picking this up that'd be appreciated 😄