rafaqz / DimensionalData.jl

Named dimensions and indexing for julia arrays and other data
https://rafaqz.github.io/DimensionalData.jl/stable/
MIT License
281 stars 41 forks source link

overhaul Coord and CoordLookupArray #429

Closed rafaqz closed 1 year ago

rafaqz commented 1 year ago

This PR fixes a number of bugs and implements missing features for coords.

@sethaxen maybe you would like to review this

sethaxen commented 1 year ago

Here are some errors I got when using this branch:

julia> da = DimArray(randn(2, 10, 4), (Dim{:var}(1:2), Dim{:draw}(1:10), Dim{:chain}(1:4)));

julia> coord = Dim{:sample}(Dimensions.CoordLookupArray(map(Tuple, vec(CartesianIndices(DimensionalData.val.(lookup(dims(da, (:draw, :chain))))))), dims(da, (:draw, :chain))));

julia> dims_new = (otherdims(da, (:draw, :chain))..., coord);

julia> da_new = DimArray(reshape(parent(da), size(da, 1), :), dims_new)
2×40 DimArray{Float64,2} with dimensions: 
  Dim{:var} Sampled{Int64} 1:2 ForwardOrdered Regular Points,
  Dim{:sample} CoordLookupArray{Tuple{Int64, Int64}} Tuple{Int64, Int64}[(1, 1), (2, 1), …, (9, 4), (10, 4)] 
  Dim{:draw} Sampled{Int64} 1:10 ForwardOrdered Regular Points,
  Dim{:chain} Sampled{Int64} 1:4 ForwardOrdered Regular Points
      (1, 1)     (2, 1)     (3, 1)    (4, 1)    (5, 1)    (6, 1)    (7, 1)   …    (4, 4)    (5, 4)    (6, 4)     (7, 4)     (8, 4)     (9, 4)     (10, 4)
 1  -0.222782  -0.172336  -1.40358  -0.18321   1.98682   0.564605  0.431032      0.748198  0.36207  -0.396447  -0.188011   0.281491  -0.354082  -2.13654
 2   0.634461   0.247453  -1.7855    1.38085  -0.365216  1.2833    0.593319     -0.30652   1.25441  -1.29112    0.0900128  0.243426  -1.3136     1.10789

:draw and :chain still look like dimensions of the array.

Then I found some issues with the namedtuple indexing:

julia> da_new[var=1, sample=(draw=At(1), chain=At(1))]
-0.22278235391694481

julia> da_new[var=1, sample=(draw=1, chain=1)]
ERROR: MethodError: no method matching _matches(::Int64, ::Int64)
Closest candidates are:
  _matches(::Tuple, ::Any) at ~/.julia/packages/DimensionalData/ZtCKP/src/Dimensions/coord.jl:60
  _matches(::DimensionalData.Dimensions.Dimension, ::Any) at ~/.julia/packages/DimensionalData/ZtCKP/src/Dimensions/coord.jl:61
  _matches(::Between, ::Any) at ~/.julia/packages/DimensionalData/ZtCKP/src/Dimensions/coord.jl:62
  ...
Stacktrace:
  [1] map
    @ ./tuple.jl:247 [inlined]
  [2] _matches(sel::Tuple{Int64, Int64}, x::Tuple{Int64, Int64})
    @ DimensionalData.Dimensions ~/.julia/packages/DimensionalData/ZtCKP/src/Dimensions/coord.jl:60
  [3] (::DimensionalData.Dimensions.var"#74#77"{Tuple{Int64, Int64}})(x::Tuple{Int64, Int64})
    @ DimensionalData.Dimensions ./none:0
  [4] iterate
    @ ./generator.jl:47 [inlined]
  [5] collect
    @ ./array.jl:787 [inlined]
  [6] selectindices
    @ ~/.julia/packages/DimensionalData/ZtCKP/src/Dimensions/coord.jl:31 [inlined]
  [7] selectindices
    @ ~/.julia/packages/DimensionalData/ZtCKP/src/Dimensions/coord.jl:18 [inlined]
  [8] selectindices
    @ ~/.julia/packages/DimensionalData/ZtCKP/src/Dimensions/coord.jl:22 [inlined]
  [9] _dims2indices
    @ ~/.julia/packages/DimensionalData/ZtCKP/src/Dimensions/indexing.jl:113 [inlined]
 [10] macro expansion
    @ ~/.julia/packages/DimensionalData/ZtCKP/src/Dimensions/indexing.jl:55 [inlined]
 [11] _dims2indices
    @ ~/.julia/packages/DimensionalData/ZtCKP/src/Dimensions/indexing.jl:55 [inlined]
 [12] dims2indices
    @ ~/.julia/packages/DimensionalData/ZtCKP/src/Dimensions/indexing.jl:50 [inlined]
 [13] dims2indices
    @ ~/.julia/packages/DimensionalData/ZtCKP/src/Dimensions/indexing.jl:34 [inlined]
 [14] #getindex#39
    @ ~/.julia/packages/DimensionalData/ZtCKP/src/array/indexing.jl:36 [inlined]
 [15] top-level scope
    @ REPL[31]:1

julia> da_new[var=1, sample=(draw=At(1:2), chain=At(1:2))]
ERROR: (1:2, 1:2) not found in coord lookup
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] _coord_not_found_error(sel::Tuple{At{UnitRange{Int64}, Nothing, Nothing}, At{UnitRange{Int64}, Nothing, Nothing}})
    @ DimensionalData.Dimensions ~/.julia/packages/DimensionalData/ZtCKP/src/Dimensions/coord.jl:54
  [3] selectindices(lookup::DimensionalData.Dimensions.CoordLookupArray{Tuple{Int64, Int64}, Vector{Tuple{Int64, Int64}}, Tuple{Dim{:draw, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Dim{:chain, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, DimensionalData.Dimensions.LookupArrays.NoMetadata}, sel::Tuple{At{UnitRange{Int64}, Nothing, Nothing}, At{UnitRange{Int64}, Nothing, Nothing}})
    @ DimensionalData.Dimensions ~/.julia/packages/DimensionalData/ZtCKP/src/Dimensions/coord.jl:28
  [4] selectindices(lookup::DimensionalData.Dimensions.CoordLookupArray{Tuple{Int64, Int64}, Vector{Tuple{Int64, Int64}}, Tuple{Dim{:draw, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Dim{:chain, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, DimensionalData.Dimensions.LookupArrays.NoMetadata}, sel::Tuple{Dim{:draw, At{UnitRange{Int64}, Nothing, Nothing}}, Dim{:chain, At{UnitRange{Int64}, Nothing, Nothing}}})
    @ DimensionalData.Dimensions ~/.julia/packages/DimensionalData/ZtCKP/src/Dimensions/coord.jl:18
  [5] selectindices(lookup::DimensionalData.Dimensions.CoordLookupArray{Tuple{Int64, Int64}, Vector{Tuple{Int64, Int64}}, Tuple{Dim{:draw, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Dim{:chain, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, DimensionalData.Dimensions.LookupArrays.NoMetadata}, sel::NamedTuple{(:draw, :chain), Tuple{At{UnitRange{Int64}, Nothing, Nothing}, At{UnitRange{Int64}, Nothing, Nothing}}})
    @ DimensionalData.Dimensions ~/.julia/packages/DimensionalData/ZtCKP/src/Dimensions/coord.jl:22
  [6] _dims2indices
    @ ~/.julia/packages/DimensionalData/ZtCKP/src/Dimensions/indexing.jl:113 [inlined]
  [7] macro expansion
    @ ~/.julia/packages/DimensionalData/ZtCKP/src/Dimensions/indexing.jl:55 [inlined]
  [8] _dims2indices(lookups::Tuple{DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}, DimensionalData.Dimensions.CoordLookupArray{Tuple{Int64, Int64}, Vector{Tuple{Int64, Int64}}, Tuple{Dim{:draw, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Dim{:chain, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, dims::Tuple{Dim{:var, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Dim{:sample, DimensionalData.Dimensions.CoordLookupArray{Tuple{Int64, Int64}, Vector{Tuple{Int64, Int64}}, Tuple{Dim{:draw, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Dim{:chain, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, I::Tuple{Dim{:var, Int64}, Dim{:sample, NamedTuple{(:draw, :chain), Tuple{At{UnitRange{Int64}, Nothing, Nothing}, At{UnitRange{Int64}, Nothing, Nothing}}}}})
    @ DimensionalData.Dimensions ~/.julia/packages/DimensionalData/ZtCKP/src/Dimensions/indexing.jl:55
  [9] dims2indices
    @ ~/.julia/packages/DimensionalData/ZtCKP/src/Dimensions/indexing.jl:50 [inlined]
 [10] dims2indices
    @ ~/.julia/packages/DimensionalData/ZtCKP/src/Dimensions/indexing.jl:34 [inlined]
 [11] #getindex#39
    @ ~/.julia/packages/DimensionalData/ZtCKP/src/array/indexing.jl:36 [inlined]
 [12] top-level scope
    @ REPL[32]:1

Also, if CoordLookupArray is meant to be used directly by the user like I am, perhaps it should be documented?

rafaqz commented 1 year ago

Also, if CoordLookupArray is meant to be used directly by the user like I am, perhaps it should be documented?

For sure. It could also have a shorter name, like Coords

rafaqz commented 1 year ago

Should this actually work though? It's not clear to me what it means. Do you intend for 1 to work like At(1) ? In the rest of the package we avoid that because 1 is used for direct indexing and ambiguous if you have a Real typed lookup. Here we don't have that problem, so it's possible to allow it.

da_new[var=1, sample=(draw=1, chain=1)]

:draw and :chain still look like dimensions of the array.

will fix

And:

da_new[var=1, sample=(draw=At(1:2), chain=At(1:2))]

This should use .., as At will try to return a scalar:

da_new[var=1, sample=(draw=1..2, chain=1..2)]
sethaxen commented 1 year ago

Should this actually work though? It's not clear to me what it means. Do you intend for 1 to work like At(1) ?

For da I can provide draw=1 with no problems. Actually, I don't know whether that just forwards the index to the parent array or if it gets the first index in the dimension and wraps that with At. If the former is what DimensionalData does, then I wouldn't expect draw=1 passed to sample would work. But if the latter is what DimensionalData does, then I would expect it to work when using CoordLookupArray as well.

rafaqz commented 1 year ago

I think what you want is sample=1. If you use sample=(draw=1, chain=2) the answer is ambiguous, so that can't work with draw=1

I'll make a better error clarifying this though.

(There is no auto wrapping with At, that was confusing when we had it. using an Int is just regular julia indexing, but for the named dimension- so your first option)

sethaxen commented 1 year ago

Thanks, that's clears things up! I'll do a proper review this evening.

rafaqz commented 1 year ago

The latter one really only makes sense if one can guarantee that the coords are essentially a vec(collect(Iterators.product(map(val, dims)))).

One interesting thing is what is causing your show layout bug - your CoordLookupArray dims field still wrap their own lookup arrays. If we have those we can unmergedims at any stage from whatever is left using the original lookups, and fill the gaps with missing.

I like mergedims and unmergdims as function names too.

rafaqz commented 1 year ago

Also show should hopefully be fixed now.

sethaxen commented 1 year ago

One interesting thing is what is causing your show layout bug - your CoordLookupArray dims field still wrap their own lookup arrays. If we have those we can unmergedims at any stage from whatever is left using the original lookups, and fill the gaps with missing.

Ah, yes, that would be interesting. Would we be able to infer whether Missing needs to be added to the eltype or not from the types?

I like mergedims and unmergdims as function names too.

An alternative that might be easier to remember if the CoordsLookupArray is named Coords or something like that would be dims2coords and coords2dims. For me the name coords isn't super intuitive though, but I don't have a better suggestion.

One other idea. What if CoordLookupArray table columns were vectors of NamedTuples? e.g. for the :chain, :draw example, the rows would be [(draw=1, chain=1), (draw=2, chain=1), ..., ((draw=9, chain=4), (draw=10, chain=4)]. This has pros and cons. The main con is verbosity. e.g. if you're working with XYZ points, the meaning of the tuples of indices is clear. But in other cases, like draw and chain, if just tuples of integers appear in a plot or a DataFrame, it's not at all obvious what those mean. If one doesn't like the verbosity, they can just map values over the column to get the tuple.

rafaqz commented 1 year ago

Yeah Coord has quite a spatial meaning (where X/Y would be the most common use of this - that was the original intention). If you have a better name I'm not attached to Coord at all.

A NamedTuple is better, yes, thats an easy change.

We would have to assume Union{T,Missing} for type stability I guess.

sethaxen commented 1 year ago

Yeah Coord has quite a spatial meaning (where X/Y would be the most common use of this - that was the original intention). If you have a better name I'm not attached to Coord at all.

What about something like MergedLookup? This would semantically link it to mergedims, which would construct a Dim whose lookup is a MergedLookup.

We would have to assume Union{T,Missing} for type stability I guess.

Make sense. An argument nomissing or function dropmissing could be used to remove the Missing from the eltype when the function knows no indices are missing.

codecov-commenter commented 1 year ago

Codecov Report

Merging #429 (a8c50ed) into main (f04ea3e) will decrease coverage by 0.58%. The diff coverage is 84.31%.

@@            Coverage Diff             @@
##             main     #429      +/-   ##
==========================================
- Coverage   89.59%   89.01%   -0.59%     
==========================================
  Files          35       36       +1     
  Lines        2480     2612     +132     
==========================================
+ Hits         2222     2325     +103     
- Misses        258      287      +29     
Impacted Files Coverage Δ
src/LookupArrays/selector.jl 86.65% <ø> (-0.49%) :arrow_down:
src/Dimensions/merged.jl 83.67% <83.67%> (ø)
src/LookupArrays/show.jl 84.05% <100.00%> (+0.23%) :arrow_up:
src/Dimensions/coord.jl 0.00% <0.00%> (-86.67%) :arrow_down:
src/Dimensions/indexing.jl 82.95% <0.00%> (+1.41%) :arrow_up:
src/Dimensions/show.jl 95.71% <0.00%> (+2.85%) :arrow_up:
src/Dimensions/primitives.jl 96.24% <0.00%> (+3.08%) :arrow_up:
src/array/indexing.jl 88.37% <0.00%> (+9.06%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

rafaqz commented 1 year ago

@sethaxen I think we can merge this now.

Do you want to PR your mergedims/unmergedims method after that?