kaskada-ai / kaskada

Modern, open-source event-processing
https://kaskada.io/
Apache License 2.0
351 stars 15 forks source link

bug: Figure out plan for supporting Datums / new Arrow APIs #783

Open bjchambers opened 1 year ago

bjchambers commented 1 year ago

Description Arrow is introducing a Datum type which makes it easier to support arrays and scalar values in a variety of methods. They are gradually implementing new kernels based on the Datum type and deprecating the old methods. We should figure out what this means for our expression evaluators.

bjchambers commented 1 year ago

At a high level, this is nice -- it makes it easier to work with Literals.

We currently intend to distinguish between add and add_scalar instructions. This allows us to look at the compiled plan and see if we have any literal instructions remaining, which (inefficiently) cause the conversion of a scalar value to an array. During plan generation, we would try to replace add(x, 1) with add_scalar(x, 1).

Generally, we think is still likely the best path, from the terms of simplicity of executing the plans and supporting instructions like field_ref which need to statically know the type of the field name.

Alternative:

  1. Use const in instruction signatures to indicate this argument must be const (eg., field ref)
  2. Allow arguments to be Some(index) or None (indicating a literal).

This could open the possibility of having a None in the arguments that doesn't have a corresponding literal value, as well as making it less clear where literals may occur.