rafaqz / DimensionalData.jl

Named dimensions and indexing for julia arrays and other data
https://rafaqz.github.io/DimensionalData.jl/stable/
MIT License
262 stars 38 forks source link

Breaking: better DimStack indexing, again #689

Closed rafaqz closed 2 months ago

rafaqz commented 3 months ago

This PR adds type parameters K,T,N to stacks for better dispatch and type stabiity in both Array-like and NamedTuple-lke behaviors.

It it incliudes a bunch of performance fixes and more complete tests.

Pacakges extending AbstractDimStack{L} will now need to extend AbstractDimStack{K,T,N,L}. There are new helper methods stacktype and data_eltype to help getting these paremeters. data_eltype can be extended for alternate stack data objects besides NamedTuple.

There is also a new standard behavour: indexing with dimensional objects like a vector of dimension tuples will do a mergedims, because you can index a subset of the dimensions. But indexing with colon or Vector of Int/CartesianIndex will return a raw vector of NamedTuple, not a mergedims DimStack - so there is a way to avoid the overhead of buiding the merged lookups.

@sethaxen this will require some changes to Arviz, maybe you want to review. It would be good to make sure I've covered everything and don't need to make any more breaking changes.

sethaxen commented 2 months ago

@rafaqz I unfortunately don't have the bandwidth for a full review, but I tested against ArviZ, and with only adding the extra type parameters to InferenceObjects.Dataset, everything works.

The indexing changes aren't a problem. I find all positional indexing of an AbstractDimStack to be nonintuitive, so in our documented examples, we always use keyword-based indexing, which seems to still behave the same with this PR.

rafaqz commented 2 months ago

No worries, my main concern was that it doesn't break your code.

Absolutely, most of the positional indexing now actually calls dimensional indexing underneeath, because its the only thing that makes sense. But some other things like a single colon or range are easier to understand e.g. it makes a vector of namedtuple which is a valid table.