rafaqz / DimensionalData.jl

Named dimensions and indexing for julia arrays and other data
https://rafaqz.github.io/DimensionalData.jl/stable/
MIT License
280 stars 41 forks source link

Table construction failes when using a custom dimension #459

Closed sethaxen closed 6 months ago

sethaxen commented 1 year ago

If we use a Dim{k} where k is a Symbol that matches the name of one of the special dimensions created with @dim (e.g. X or Ti), then an error is raised when we use the Tables interface:

julia> using DimensionalData, Tables

julia> da = DimArray(randn(5, 2), (Dim{:X}(1:5), Dim{:Y}(1:2)))
5×2 DimArray{Float64,2} with dimensions: 
  Dim{:X} Sampled{Int64} 1:5 ForwardOrdered Regular Points,
  Dim{:Y} Sampled{Int64} 1:2 ForwardOrdered Regular Points
     1          2
 1   1.2183    -0.567417
 2   0.405508  -0.120042
 3  -0.748847  -0.0844179
 4   0.298297  -0.87451
 5  -1.85563    0.100783

julia> Tables.columntable(da)
ERROR: ArgumentError: Some dims were not found in object
Stacktrace:
 [1] _errorextradims()
   @ DimensionalData.Dimensions ~/.julia/packages/DimensionalData/6v7CY/src/Dimensions/primitives.jl:649
 [2] dimnum
   @ ~/.julia/packages/DimensionalData/6v7CY/src/Dimensions/primitives.jl:201 [inlined]
 [3] getcolumn
   @ ~/.julia/packages/DimensionalData/6v7CY/src/tables.jl:196 [inlined]
 [4] getcolumn
   @ ~/.julia/packages/DimensionalData/6v7CY/src/tables.jl:200 [inlined]
 [5] macro expansion
   @ ~/.julia/packages/Tables/T7rHm/src/namedtuples.jl:0 [inlined]
 [6] _columntable(sch::Tables.Schema{(:X, :Y, Symbol("")), Tuple{Int64, Int64, Float64}}, cols::DimTable{(:X, :Y, Symbol("")), DimStack{NamedTuple{(Symbol(""),), Tuple{Matrix{Float64}}}, Tuple{Dim{:X, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Dim{:Y, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, Tuple{}, NamedTuple{(Symbol(""),), Tuple{Tuple{Dim{:X, Colon}, Dim{:Y, Colon}}}}, DimensionalData.Dimensions.LookupArrays.NoMetadata, NamedTuple{(Symbol(""),), Tuple{DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, Tuple{DimensionalData.DimColumn{Int64, Dim{:X, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, DimensionalData.DimColumn{Int64, Dim{:Y, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}}, NamedTuple{(Symbol(""),), Tuple{DimensionalData.DimArrayColumn{Float64, DimArray{Float64, 2, Tuple{Dim{:X, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Dim{:Y, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, Tuple{}, Matrix{Float64}, Symbol, DimensionalData.Dimensions.LookupArrays.NoMetadata}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64}}}})
   @ Tables ~/.julia/packages/Tables/T7rHm/src/namedtuples.jl:158
 [7] columntable(sch::Tables.Schema{(:X, :Y, Symbol("")), Tuple{Int64, Int64, Float64}}, cols::DimTable{(:X, :Y, Symbol("")), DimStack{NamedTuple{(Symbol(""),), Tuple{Matrix{Float64}}}, Tuple{Dim{:X, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Dim{:Y, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, Tuple{}, NamedTuple{(Symbol(""),), Tuple{Tuple{Dim{:X, Colon}, Dim{:Y, Colon}}}}, DimensionalData.Dimensions.LookupArrays.NoMetadata, NamedTuple{(Symbol(""),), Tuple{DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, Tuple{DimensionalData.DimColumn{Int64, Dim{:X, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, DimensionalData.DimColumn{Int64, Dim{:Y, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}}, NamedTuple{(Symbol(""),), Tuple{DimensionalData.DimArrayColumn{Float64, DimArray{Float64, 2, Tuple{Dim{:X, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Dim{:Y, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, Tuple{}, Matrix{Float64}, Symbol, DimensionalData.Dimensions.LookupArrays.NoMetadata}, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64}}}})
   @ Tables ~/.julia/packages/Tables/T7rHm/src/namedtuples.jl:170
 [8] columntable(itr::DimArray{Float64, 2, Tuple{Dim{:X, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Dim{:Y, DimensionalData.Dimensions.LookupArrays.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Int64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, Tuple{}, Matrix{Float64}, DimensionalData.NoName, DimensionalData.Dimensions.LookupArrays.NoMetadata})
   @ Tables ~/.julia/packages/Tables/T7rHm/src/namedtuples.jl:185
 [9] top-level scope
   @ REPL[6]:1

julia> da2 = DimArray(randn(5, 2), (Dim{:x}(1:5), Dim{:y}(1:2)));  # lowercase is fine

julia> Tables.columntable(da2)
(x = [1, 2, 3, 4, 5, 1, 2, 3, 4, 5], y = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2], var"" = [1.0560166901688783, 2.2394390707844547, -0.14113224570277316, -0.414630329366848, -0.0390640189596709, 1.237460475936745, -0.758062810295786, -1.1155720060272452, -0.8984658474106461, -1.056171740058078])
rafaqz commented 1 year ago

Yeah we should just make them equivalent. The problem is converting to Symbol loses the distinction.

sethaxen commented 1 year ago

Yeah we should just make them equivalent

Make what equivalent?

rafaqz commented 1 year ago

E.g. X and Dim{:X} could be compared as equivalent.

Because of key2dims and conversion to table etc they are in effect, but they are not compared as equalent in e.g. sortdims.

sethaxen commented 1 year ago

If we end up making them equivalent, what's the benefit of X being something different from const X = Dim{:X} instead of its own type? My assumption was one of the purposes of X is to have a reserved name that is differentiated from Dim{:X}.

rafaqz commented 1 year ago

There are a few things.

Plotting uses the types to decide plot axes here and in Rasters.jl. So both Ti and X are IndependentDim and go on the x axis. We could switch that to using traits defined on Dim{:X} and Dim{:Ti}. But plotting things the right way is a big part of what Rasters.jl does so this has to work somehow or other.

You can also manually define a dimension with @dim MyXDim XDim and it will also plot on the x axis. Currently you can do this and e.g. load a netcdf with weird dimension names in Rasters.jl and they will get your dims behaviour if they match. But we could do this with a trait as well, or some other mechanism.

isxdim(::Dim{:X}) = true
isxdim(::Dim{:x}) = true
isydim(::Dim{:Y}) = true
isydim(::Dim{:y}) = true

const X = Dim{:X}
const Y = Dim{:Y}

Users could just do:

DimensionalData.isydim(::Dim{:MyYDim}) = true

For package devs I'm not sure who gets to define behaviors for these dims, e.g. in a package you may want do define plotting. In Rasters.jl that happens for Band, with the illusion of there being no type piracy - but because Band is essentially the same as Dim{:Band} it's still kind of piracy for some applications.

rafaqz commented 1 year ago

I wrote dimension behavior before I realized table keys are all symbols and dims should be columns, and that a[customdim=4] syntax is often preferable to a[Dim{:customdim}(4)], and that these dims would be widely used. I still basically never use Dim.

Probably would have done things differently if those were in place from the start.