Open ParadaCarleton opened 2 years ago
Yeah, this has been a contentious thing for a while. It initially required At
- I chose it because its the shortest thing that could mean that to minimise how annoying this is... but then I switched it so what you want actually worked up until a few months ago... and then went back to requiring At
! because automatic At
is confusing. But you're right, it's mostly numbers that are weird without At
when using DimArray
.
Symbol
can also be weird with DimStack
- a single Symbol
getindex
is essentially NamedTuple
indexing to get DimArray
layers by name. But At(:symbol)
is indexing the values of all the layers at the same time - which is a very different thing. And its nice if DimArray
and DimStack
have the same syntax. Of course with a Dimension
wrapper its obvious what the Symbol
means even for DimStack
.
So to answer your question, no it's not totally necesary, but the semantics starts to feel a bit crowded without At
wrappers.
I have thought about adding a trait somewhere for automatic At
so people can choose that if they need it.
Symbol
can also be weird withDimStack
- a singleSymbol
getindex
is essentiallyNamedTuple
indexing to getDimArray
layers by name. ButAt(:symbol)
is indexing the values of all the layers at the same time - which is a very different thing. And its nice ifDimArray
andDimStack
have the same syntax. Of course with aDimension
wrapper its obvious what theSymbol
means even forDimStack
.
I think that makes some sense, but it definitely causes problems for anyone trying to transition their codebase from AxisArrays. That being said, isn't not having to use At()
a way of keeping the interface more consistent between DimStack
and DimArray
? That way it's possible to index with a symbol regardless of whether DimArray
or DimStack
is being used. (Even though it's not using a dimension struct, I would naturally tend to think of the names for DimStack
as being something like an extra dimension which can be indexed along.)
Alternatively, given that DimStack
is probably going to see a lot less use than DimArray
, the syntax for DimStack
could be changed, and then symbols could be treated the same as symbols for DimArray
; perhaps AtStack()
? Or perhaps the dimension name could be used; indexing a DimStack
as x[:dim_array]
(not providing a dimension name) treats x
as a named tuple, while x[person=:name]
indexes all the layers at the same time regardless of whether you're using a DimStack
or DimArray
.
Yeah, this has been a contentious thing for a while. It initially required
At
- I chose it because its the shortest thing that could mean that to minimise how annoying this is... but then I switched it so what you want actually worked up until a few months ago... and then went back to requiringAt
! because automaticAt
is confusing. But you're right, it's mostly numbers that are weird withoutAt
when usingDimArray
.
Symbol
can also be weird withDimStack
- a singleSymbol
getindex
is essentiallyNamedTuple
indexing to getDimArray
layers by name. ButAt(:symbol)
is indexing the values of all the layers at the same time - which is a very different thing. And its nice ifDimArray
andDimStack
have the same syntax. Of course with aDimension
wrapper its obvious what theSymbol
means even forDimStack
.So to answer your question, no it's not totally necesary, but the semantics starts to feel a bit crowded without
At
wrappers.I have thought about adding a trait somewhere for automatic
At
so people can choose that if they need it.
It's worth noting that if symbol keys are automatically wrapped in At
, switching from AxisKeys becomes a lot more viable for other packages -- for instance, we're considering making this swap in Turing, but having to change all the indexing-related syntax would be a definite hassle.
I'm open to working on a change if this will be blocking to use cases elsewhere. But we have to be careful with the syntax as we're trying to do more things than AxisArrays.jl does.
DimStack
is actually pretty central to the reason for writing this package, mostly through RasterStack <: AbstractDimStack
in Rasters.jl. It can represent a netcdf and similar structures with multiple layers with mixed number of dims in a form that is easy to use and fast to index. And it's a table...
The specific syntax clash is this edge case, which is unlikely to happen very often, but needs handling:
julia> using DimensionalData
julia> st = DimStack((a=rand(10), b=falses(10)), (X(Symbol.('a':'j')),))
DimStack with dimensions:
X Categorical Symbol[a, b, …, i, j] ForwardOrdered
and 2 layers:
:a Float64 dims: X (10)
:b Bool dims: X (10)
julia> st[:a]
10-element DimArray{Float64,1} a with dimensions:
X Categorical Symbol[a, b, …, i, j] ForwardOrdered
0.764328
0.810431
0.37886
0.689809
0.368838
0.284454
0.665395
0.326088
0.762186
0.596156
julia> st[At(:a)]
(a = 0.7643275682479944, b = false)
I may have a solution to this. We can use getproperty
to access DimStack
layers, instead of getindex
, like:
julia> st.a
10-element DimArray{Float64,1} a with dimensions:
X Categorical Symbol[a, b, …, i, j] ForwardOrdered
0.764328
0.810431
0.37886
0.689809
0.368838
0.284454
0.665395
0.326088
0.762186
0.596156
julia> st[:a]
(a = 0.7643275682479944, b = false)
That might even be nice to separate out the syntax a little more anyway - so that getindex
is always array-like, rather than dict-like.
I may have a solution to this. We can use
getproperty
to accessDimStack
layers, instead ofgetindex
, like:julia> st.a 10-element DimArray{Float64,1} a with dimensions: X Categorical Symbol[a, b, …, i, j] ForwardOrdered 0.764328 0.810431 0.37886 0.689809 0.368838 0.284454 0.665395 0.326088 0.762186 0.596156 julia> st[:a] (a = 0.7643275682479944, b = false)
That might even be nice to separate out the syntax a little more anyway - so that
getindex
is always array-like, rather than dict-like.
Definitely like that solution! Sounds like a good idea to me.
This is going to break a LOT of scripts built on Rasters.jl.
I will have to check in with people and see what they think, so it might take a while.
This is going to break a LOT of scripts built on Rasters.jl.
I will have to check in with people and see what they think, so it might take a while.
Let me know what people think when you've talked to them about this.
I think this is a good idea, though; in general, transitioning from AxisArrays to DimensionalData should be as smooth as possible, maybe even automatic (as in, we could write a script that takes all code written with AxisArrays and converts it into DimensionalData code).
Would it make sense to infer, when a string or symbol is passed to
getindex
, that the user probably wantsAt(string)
? Adding this would probably make it easier to transition a bunch of AxisArrays code to use DimensionalData.jl.