rafaqz / DimensionalData.jl

Named dimensions and indexing for julia arrays and other data
https://rafaqz.github.io/DimensionalData.jl/stable/
MIT License
269 stars 37 forks source link

Ordering not updated after indexing #563

Closed bjarthur closed 9 months ago

bjarthur commented 9 months ago

if i index an ordered dimension with indices that are unordered, the returned array is unordered but the type says it is ordered. subsequent indexes into this latter array then fail:

julia> A = DimArray(rand(4), Y(1:4))
4-element DimArray{Float64,1} with dimensions: 
  Y Sampled{Int64} 1:4 ForwardOrdered Regular Points
 1  0.119793
 2  0.771181
 3  0.445828
 4  0.959283

julia> Asubset = A[Y(At([2,1]))]
2-element DimArray{Float64,1} with dimensions: 
  Y Sampled{Int64} Int64[2, 1] ForwardOrdered Irregular Points    ### says ForwardOrdered here, but it's not!
 2  0.771181
 1  0.119793

julia> Asubset[Y(At(2))]
ERROR: ArgumentError: 2 not found in [2, 1]
Stacktrace:
  [1] _selvalnotfound(lookup::DimensionalData.Dimensions.LookupArrays.Sampled{Int64, Vector{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Irregular{Tuple{Int64, Int64}}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}, selval::Int64)
    @ DimensionalData.Dimensions.LookupArrays ~/.julia/dev/DimensionalData/src/LookupArrays/selector.jl:181
  [2] _selnotfound_or_nothing(err::DimensionalData.Dimensions.LookupArrays._True, lookup::DimensionalData.Dimensions.LookupArrays.Sampled{Int64, Vector{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Irregular{Tuple{Int64, Int64}}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}, selval::Int64)
    @ DimensionalData.Dimensions.LookupArrays ~/.julia/dev/DimensionalData/src/LookupArrays/selector.jl:179
  [3] #at#23
    @ ~/.julia/dev/DimensionalData/src/LookupArrays/selector.jl:162 [inlined]
  [4] at
    @ ~/.julia/dev/DimensionalData/src/LookupArrays/selector.jl:140 [inlined]
  [5] #at#21
    @ ~/.julia/dev/DimensionalData/src/LookupArrays/selector.jl:124 [inlined]
  [6] at
    @ ~/.julia/dev/DimensionalData/src/LookupArrays/selector.jl:123 [inlined]
  [7] #selectindices#17
    @ ~/.julia/dev/DimensionalData/src/LookupArrays/selector.jl:105 [inlined]
  [8] selectindices
    @ ~/.julia/dev/DimensionalData/src/LookupArrays/selector.jl:105 [inlined]
  [9] _dims2indices
    @ ~/.julia/dev/DimensionalData/src/Dimensions/indexing.jl:114 [inlined]
 [10] macro expansion
    @ ~/.julia/dev/DimensionalData/src/Dimensions/indexing.jl:56 [inlined]
 [11] _dims2indices
    @ ~/.julia/dev/DimensionalData/src/Dimensions/indexing.jl:56 [inlined]
 [12] dims2indices
    @ ~/.julia/dev/DimensionalData/src/Dimensions/indexing.jl:51 [inlined]
 [13] dims2indices
    @ ~/.julia/dev/DimensionalData/src/Dimensions/indexing.jl:34 [inlined]
 [14] #getindex#58
    @ ~/.julia/dev/DimensionalData/src/array/indexing.jl:47 [inlined]
 [15] getindex(A::DimArray{Float64, 1, Tuple{Y{DimensionalData.Dimensions.LookupArrays.Sampled{Int64, Vector{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Irregular{Tuple{Int64, Int64}}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, Tuple{}, Vector{Float64}, DimensionalData.NoName, DimensionalData.Dimensions.LookupArrays.NoMetadata}, args::Y{At{Int64, Nothing, Nothing}})
    @ DimensionalData ~/.julia/dev/DimensionalData/src/array/indexing.jl:47
 [16] top-level scope
    @ REPL[4]:1

julia> Asubset = A[Y(At([1,2]))]      # this works because the indices are ordered
2-element DimArray{Float64,1} with dimensions: 
  Y Sampled{Int64} Int64[1, 2] ForwardOrdered Irregular Points
 1  0.119793
 2  0.771181

julia> Asubset[Y(At(2))]
0.7711807519265748
rafaqz commented 9 months ago

The docs do specify that this is undefined behavior...

The problem is fixing it has a performance impact as it makes all indexing type unstable.

One option is to stop using types to specify lookup order, and have bool flags instead. But that also has a performance cost.

Basically there is no way to handle this corner case without a cost to all the common use cases. So Im not sure what to do about it.

Probably manually updating the lookup order yourself with set(A, X=>Unordered()) (where X is the dimension) is the best option.

bjarthur commented 9 months ago

ok, thanks, sorry i missed it in the docs. will close.

rafaqz commented 9 months ago

If you have any ideas and opinions on fixing it it is up for debate, but its not an easy thing to do.