influxdata / flux

Flux is a lightweight scripting language for querying databases (like InfluxDB) and working with data. It's part of InfluxDB 1.7 and 2.0, but can be run independently of those.
https://influxdata.com
MIT License
770 stars 153 forks source link

Lambda function instantiation #3347

Closed jpacik closed 11 months ago

jpacik commented 3 years ago

Compile takes a semantic function expression and returns something that can be evaluated for a specific table at runtime https://github.com/influxdata/flux/blob/aa796e58df5c83cfbcd774e607b50f9f59b1cf88/compiler/compiler.go#L10

This is how map and filter evaluate arbitrary lambdas over a table stream. With https://github.com/influxdata/flux/issues/3337 it is possible for lambda functions to have polymorphic literals, the types of which we will not know until runtime. This means that we will need to re-implement the monorphization procedure on the Go side for lambda expressions before calling Compile.

Background

After monomorphization on the Rust side, there still might be polymorphic literal types left in the semantic graph. So we need to define the same polymorphic literal node on the Go side.

We'll need to handle this new node type in both the interpreter and compiler packages. Note however that because of monomorphization on the Rust side, the interpreter should never encounter a polymorphic literal node. They should only survive in the body of lambda functions passed to map, filter, etc.

Though these types might be present in the function expressions of map and filter, we don't want to explicitly evaluate them. This is because evaluating polymorphic values at runtime is not equivalent to monomorphizing polymorphic types and evaluating the resulting expression. We want to preserve the same semantics between the interpreter and compiler packages.

Therefore we need to perform the same monomorphization process that we do on the Rust side, before we evaluate these lambdas over a table. This will involve substituting the types in the lambda expression with the concrete types of the incoming table and replacing any polymorphic literals with their concrete equivalents.

How do we know if we need to monomorphize?

We have to look at the type signature of the function expression to see if it's a candidate for monomorphization. If we see a type variable that is constrained as NumericDefaultInt then we know we have to instantiate specific versions. Unfortunately we don't know the constraints associated with any type variables of a function expression (or any expression for that matter). Constraints are only associated with polytypes and variables assignments. This can't be the case if we want to do be able to perform monomorphization at runtime.

Can't we just replace any unresolved polymorphic literals?

Unfortunately that's not sufficient. For example, the following lambda can be monomorphized without needing to know any type constraints because we know the literal 1 has to be interpreted as an integer if r._value is an integer

|> map(fn: (r) => ({r with _value: r._value + 1}))

However, the following lambda needs to know if x is NumericDefaultInt in order to determine if x should be replaced with x_int or x_flt

x = 1
...
|> map(fn: (r) => ({r with _value: r._value + x}))

This part will require a separate design issue.

Approach

In the case we have a candidate lambda function, given an incoming table, we need to instantiate a new version of the lambda function for the specific types of the table. We will then need to cache that particular instantiation so that it can be reused if needed for any other incoming tables.

This might be a good opportunity to define native Go structures for the different flux monotypes. Currently we wrap flatbuffers which are difficult to modify when applying substitutions. Since we don't use flatbuffers for any other parts of the semantic graph, it might be a good idea to remove this complexity.

DOD

The real challenge is going to be determining the type constraints for a function expression. This will need either a spike or a design issue detailing the approach. It could represent a non-trivial amount of work if it's determined that we need to re-define our monotype representation.

github-actions[bot] commented 11 months ago

This issue has had no recent activity and will be closed soon.