kaskada-ai / kaskada

Modern, open-source event-processing
https://kaskada.io/
Apache License 2.0
349 stars 15 forks source link

Wrapper function/type for creating Scalars #787

Open bjchambers opened 11 months ago

bjchambers commented 11 months ago
          Could add a helper for creating the static somewhere, seen it in a few instances now

_Originally posted by @jordanrfrazier in https://github.com/kaskada-ai/kaskada/pull/784#discussion_r1342918969_

bjchambers commented 11 months ago

Arrow recently added Scalar which simplifies many of the APIs handling scalar values. We could potentially use this to replace our ScalarValue enum.

See https://github.com/apache/arrow-rs/pull/4793/files for some notes:

  1. We can use Int64Array::new_scalar(...) to simplify creating a scalar of a specific type.
  2. We can create a Scalar<ArrayRef> for using an array as a scalar.

This should let us create something like:

struct ScalarValue(Scalar<ArrayRef>);

// 1. Serde the underlying ArrayRef
// 2. Support use as a scalar anywhere
bjchambers commented 11 months ago

@jordanrfrazier FYI -- I think we should be able to completely replace our ScalarValue enum with Scalar<ArrayRef>. I think it's probably worth doing that for how we serialize scalars in the physical plan.

Thoughts on:

A) Trying to change the ScalarValue to use that representation (potentially messy since we have that protobuf in many places0. B) Introducing a parallel Scalar??? to represent that? C) Just using arrow_array::Scalar<arrow_array::ArrayRef> in those places?

I think with C we'd need to do the custom serialization like we did for structs with an ArrayRef, while with A or B we could encapsulate that in our serialization of the wrapper type we create.