uwdata / arquero

Query processing and transformation of array-backed data tables.
https://idl.uw.edu/arquero
BSD 3-Clause "New" or "Revised" License
1.23k stars 64 forks source link

Support explicit column types. #33

Open ericemc3 opened 3 years ago

ericemc3 commented 3 years ago

First congratulations for that impressive work, which i consider, being a R user and a D3 fan, as a huge step forward for live and sexy datavisualisation and dataflows!

It looks like Arquero, from CSV for instance, is able to infer column types (Date, Numeric, String...), as we can see with the view() display (columns right or left-alignement), or by testing values type with typeof. Could that information be exposed in the table object, allowing for instance to test for numeric columns only?

A cool feature, starting from this, could be for instance: select(1, aq.isNumeric()) or groupby(v1).rollup(*...here sum all numeric variables keeping same name...*)

I am used to this convenient R/dplyr syntax : summarise_if(is_numeric, sum) or summarise( across(where(is_numeric), sum) )

jheer commented 3 years ago

Arquero is largely type-agnostic by design, so it may take a while for column-type-specific features to develop. That said, more fine-grained type inference is necessary for binary serialization (e.g., to Apache Arrow columns as in #31), so that should help push this forward.

jheer commented 3 years ago

I'm marking this an enhancement / feature request, though the exact form this might take is still not yet clear. See also #2.

bmschmidt commented 3 years ago

There might be some logic to making this feature exist specifically when using Arrow columns/vectors as the backend data store, because then you could piggyback off the Arrow types rather than have to work out some logic about mixed column types inside JS arrays.