spiraldb / vortex

An extensible, state-of-the-art columnar file format
https://vortex.dev
Apache License 2.0
1.01k stars 28 forks source link

Rename SomeArray -> SomeArrayData #1491

Open gatesn opened 4 days ago

gatesn commented 4 days ago

For encoded arrays, we should name them <Encoding>ArrayData and then we can use dyn BoolArray etc to refer to the dtype-specific things that we currently call "variants".

a10y commented 4 days ago

Another possibly crazy option is to keep XYZArray for the specific arrays and then DynBoolArray, DynPrimitiveArray etc. for the dtype-typed arrays.

Or even crazier: make rust types for each of our DTypes, e.g.

pub struct BoolType {
    nullability: Nullability
}

pub struct PrimitiveType {
    nullability: Nullability,
    ptype: PType,
}

pub enum DType {
  Bool(BoolType),
  Primitive(PrimitiveType),
  // etc. ...
}

Then you have a real type that you can use to represent each of the logical types directly. E.g. you could even have a struct Array<LogicalType> and then impl Array<BoolType> which uses the vtable that it was provided by the encoded array used to make the Array

gatesn commented 3 days ago

Would be neat to figure out a way to do this. One problem we had with arrays is that you can't impl Array<X> when Array is in vortex-array, but X is in your encoding crate.

This is less a problem for logical DTypes, since they're hard-coded in vortex-array, so it could work well. But it wouldn't extend necessarily to concrete extension types (only abstract ones, e.g. Array<ExtDType>).

Will certainly have a think / play around and see if something nice falls out