vectordotdev / vrl

Vector Remap Language
Mozilla Public License 2.0
115 stars 52 forks source link

Support for type widening #911

Open DylanRJohnston opened 1 week ago

DylanRJohnston commented 1 week ago

Hey all, I'm currently working on a transpiler from another data processing language to VRL. Unfortunately due to a NDA I can't go into more specifics but a common problem I've encountered is VRL's strict error handling based on type information that is not possible to know at transpilation time.

For example taking a common operator from most other languages

static_query += expr

the equivalent VRL is

static_query = static_query + expr

however this often fails to compile due to strict error handling around types, unless both static_query and expr are known to be integers, this expression can fail, and requires error handling

static_query, err = static_query + expr

However, you can't just use this in the transpiler as often there are times when the types are well known enough for this expression to not fail, which then causes the compiler to fail as VRL does not allow superfluous error handling. Using type assertions does not work for the same reason.

A workaround I've taken to using is using dead code paths to trick the compiler into widening the type and therefore allowing the superfluous type assertion e.g.

if false {
  static_query = null
}
static_query = int!(static_query) + expr

However expr can also sometimes have a known type, and sometimes not, if expr is another static query into the event, it doesn't have a known type, if expr is a literal, or a reference to a variable with a known type it does. Which requires in the most general case something like.

static_query = int!({
    tmp = static_query
    if false {
      tmp = null
    }
    tmp
  }) + int!({
    tmp = expr
    if false {
      tmp = null
    }
    tmp
  })

As you can see this becomes very noisy very quickly. I was wondering how you'd feel about the introduction of a type assertion that widens types, e.g. any() similar to int() or string(), it would be infailable as it's always safe to map any type to any and would help greatly in these cases where the transpiler is needing to output code without enough type information. e.g.

static_query = int!(any(static_query)) + int!(any(expr))

What do you think?

DylanRJohnston commented 1 week ago

An alternative proposal would be to allow type assertions to always be failable and then have the compiler turn them into a noop if the the type is known at compile time, allowing the expression to become

static_query = int!(static_query) + int!(expr)
jszwedko commented 2 days ago

Thanks for this issue @DylanRJohnston ! I was going suggest what you have in https://github.com/vectordotdev/vrl/issues/911#issuecomment-2190435880 as an option too. That is: we have the type assertion functions always be fallible.

I'm curious what you think @pront

pront commented 2 days ago

+1 on the approach described in https://github.com/vectordotdev/vrl/issues/911#issuecomment-2190435880

DylanRJohnston commented 2 days ago

I believe that making them always failable is a breaking change, as any code with infailable assertions would now produce an error. So they'd need to be some kind of weird middle ground. You can simulate this right now using the dead code path trick. I guess you could argue that the infallible assertions shouldn't be there in the first place?

Playground Link

x = 3
. = int(x)

Playground Link

x = 3
if false { x = null }
. = int(3)
jszwedko commented 1 day ago

Yeah, my thought is that it'd be weird to have infallible type assertions in the first-place 🤔

I wonder if we could make this an opt-in feature to validate how much of a breaking change it would be for users. The flow would be something like: