rafaqz / DimensionalData.jl

Named dimensions and indexing for julia arrays and other data
https://rafaqz.github.io/DimensionalData.jl/stable/
MIT License
274 stars 39 forks source link

Extending Base.stack for DimArrays #645

Open brendanjohnharris opened 7 months ago

brendanjohnharris commented 7 months ago

Adds methods for Base.stack, and related non-exported functions from Base, that are compatible with DimArrays.

Syntax follows Base: stacking DimArrays along a given axis dims creates a new dimension. However, existing dimension data is preserved, and the new dimension becomes an AnonDim.

Optionally, a Dimension dim can be provided as the first argument to stack, in which case the new dimension is assigned as dim rather than AnonDim.

codecov-commenter commented 7 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 83.99%. Comparing base (9846bfe) to head (884ab68). Report is 41 commits behind head on main.

:exclamation: Current head 884ab68 differs from pull request most recent head 2c7b8b9

Please upload reports for the commit 2c7b8b9 to get more accurate results.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #645 +/- ## ========================================== + Coverage 83.83% 83.99% +0.15% ========================================== Files 45 45 Lines 4102 4136 +34 ========================================== + Hits 3439 3474 +35 + Misses 663 662 -1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

brendanjohnharris commented 7 months ago

Thanks for the suggestions! Went through and cleaned up the PR.

The Base.stack(dim, X; dims) method lets you supply a Dimension to assign to the newly created array dimension; not strictly necessary, but more convenient than setting or rebuilding the resulting array to overwrite the new AnonDim (is an AnonDim a suitable way to treat a newly created dimension, as I've done here?).

Let me know if you have more thoughts on this, happy to work on it further :)

rafaqz commented 7 months ago

We try not to change base method signatures besides allowing dims to be named. To keep the mental model very simple.

I would just put that new Dimension in dims like cat does. If its a constructed dimension it replaces the existing one. Otherwise Type/Symbol etc just choose them. Just check d isa Dimension

You will also need to format(newdims, A) afterwards.

brendanjohnharris commented 7 months ago

Base.stack behaves differently than Base.cat in that it always creates a new dimension (i.e. Base.stack([A, B,...]; dims=2) inserts a dimension at position 2, increasing ndims(A) by 1); in general this new dimension is unrelated to the existing array dimensions, so we need to specify both position (dims) and a Dimension (dim).

I see your point about keeping new signatures to a minimum, though. How about allowing the keyword argument dims to be assigned as an 'Integer=>Dimension' pair, with the following behavior:

  1. If dims isa Integer: The new dimension is an AnonDim at position dims
  2. If dims isa Dimension: The new dimension is a DImension dims at position ndims(A)+1
  3. If dims isa Pair{Integer, Dimension}: The new dimension is a Dimension last(dims) at position first(dims)

We could also, instead, have that the new dimension is always an AnonDim, leaving it up to the user to set that dimension at a later time; in that case, however, there is ambiguity if there are any existing AnonDims in the input arrays.

rafaqz commented 7 months ago

Using a Pair sounds like a good solution (as do points 1 and 2 as simpler cases). For 2 I guess putting it as the last dimension would be the majority use-case? (I rarely use stack)

Always AnonDim will require using set all the time with AnonDim => somedim. We may as well just put that pair in stack and skip the step.