rafaqz / DimensionalData.jl

Named dimensions and indexing for julia arrays and other data
https://rafaqz.github.io/DimensionalData.jl/stable/
MIT License
283 stars 42 forks source link

`DimGroupByArray` doesn't iterate its `eltype` #875

Open tiemvanderdeure opened 20 hours ago

tiemvanderdeure commented 20 hours ago

Just stumbled across this and wanted to make sure that it is intentional?

using DimensionalData
da = rand(X(1:10), Y(1:10))
grps = groupby(da, X => isodd)
first(grps) isa eltype(grps) # false

this also breaks the AbstractArray interface, e.g.

julia> collect(grps)
ERROR: MethodError: Cannot `convert` an object of type 
  DimArray{Float64,2,Tuple{X{Sampled{Int64, SubArray{Int64, 1, UnitRange{Int64}, Tuple{Vector{Int64}}, false}, ForwardOrdered, Irregular{Tuple{Nothing, Nothing}}, Points, NoMetadata}},Y{Sampled{Int64, UnitRange{Int64}, ForwardOrdered, Regular{Int64}, Points, NoMetadata}}},Tuple,SubArray{Float64,2,Array{Float64,2},Tuple{Vector{Int64},Base.Slice{Base.OneTo{Int64}}},false},DimensionalData.NoName,NoMetadata} to an object of type 
  DimArray{Float64,1,Tuple{Y{Sampled{Int64, UnitRange{Int64}, ForwardOrdered, Regular{Int64}, Points, NoMetadata}}},Tuple{X{Sampled{Int64, UnitRange{Int64}, ForwardOrdered, Regular{Int64}, Points, NoMetadata}}},SubArray{Float64,1,Array{Float64,2},Tuple{Int64,Base.Slice{Base.OneTo{Int64}}},true},DimensionalData.NoName,NoMetadata}
The function `convert` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  convert(::Type{T}, ::T) where T
   @ Base Base.jl:126
  convert(::Type{T}, ::LinearAlgebra.Factorization) where T<:AbstractArray
   @ LinearAlgebra ~/.julia/juliaup/julia-1.11.0+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/factorization.jl:104
  convert(::Type{T}, ::T) where T<:AbstractArray
   @ Base abstractarray.jl:16
  ...

Stacktrace:
 [1] setindex!(A::Vector{…}, x::DimMatrix{…}, i::Int64)
   @ Base ./array.jl:976
 [2] copyto_unaliased!
   @ ./abstractarray.jl:1087 [inlined]
 [3] copyto!
   @ ./abstractarray.jl:1061 [inlined]
 [4] _collect_indices
   @ ./array.jl:723 [inlined]
 [5] collect
   @ ./array.jl:707 [inlined]
 [6] collect(A::DimensionalData.DimGroupByArray{…})
   @ DimensionalData ~/.julia/packages/DimensionalData/iHSqC/src/array/array.jl:267
 [7] top-level scope
   @ REPL[2]:1
Some type information was truncated. Use `show(err)` to see complete types.
rafaqz commented 20 hours ago

Ugh that's a bug, I think its but tricky to calculate without allocating

tiemvanderdeure commented 20 hours ago

Is it? Isn't it just the same type of any view over the dimensions we group by?

first(grps) isa typeof(view(da, X = [1])) # true
rafaqz commented 19 hours ago

View may be different to getindex 😭

Guess we need to manually hack it to always be views of the lookups

So manual slicedims calls or whatever. And some tricky getindex/view dispatch and ambiguity handling

rafaqz commented 5 hours ago

Or maybe we can just always view even on getindex. That can just be what a DimGroupByArray is.

tiemvanderdeure commented 5 hours ago

I don't know if I totally follow you. Doesn't a DimGroupByArray always contains only views of the original array?

tiemvanderdeure commented 5 hours ago

I tracked it down to these lines: https://github.com/rafaqz/DimensionalData.jl/blob/a5585f5470b59ce97652b335a05f6a5d57f0548f/src/dimindices.jl#L307-L311

Where T becomes the eltype of the DimSlices. By taking the first of the axes you always lose the dimension, but the indices here are vectors of vectors we don't in reality. If I change it to

    inds = map(newdims) do idx
        rebuild(idx, first(idx))
    end

then it works. But that's probably not the actual fix.

rafaqz commented 1 hour ago

Ah right that might be the fix actually, I clearly don't remember how it works 😅