rafaqz / DimensionalData.jl

Named dimensions and indexing for julia arrays and other data
https://rafaqz.github.io/DimensionalData.jl/stable/
MIT License
271 stars 38 forks source link

Contains for categorical Dimensions should call Base.contains on the Strings #532

Closed felixcremer closed 11 months ago

felixcremer commented 1 year ago

I am surprised, that the Contains selector for Categorical values does call At and does not call the contains function. My use case is, that I have a long Variable dimensions with many different named variables for which some are similar or following a certain naming pattern. And then I want to select a certain variable group like in this shortened example. I would expect that the Contains(value) selector behaves as calling Where(contains(value)) I can open a PR with these changes, but I might need some help in making the Vector case work.

julia> arr = DimArray(rand(10,10,4), (X(1:10), Y(1:10), Dim{:Variable}(["root_moisture", "soil_moisture", "air_temperature", "something"])))
10×10×4 DimArray{Float64,3} with dimensions: 
  X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
  Y Sampled{Int64} 1:10 ForwardOrdered Regular Points,
  Dim{:Variable} Categorical{String} String["root_moisture", "soil_moisture", "air_temperature", "something"] Unordered
[:, :, 1]
     1          2          3         4         5         6         7           8          9         10
  1  0.0451999  0.721679   0.472552  0.172361  0.838639  0.748815  0.00979697  0.0228791  0.312279   0.254207
  ⋮                                            ⋮                                                     ⋮
 10  0.0962899  0.0916193  0.856692  0.725752  0.530497  0.891864  0.307378    0.40408    0.429365   0.0391044
[and 3 more slices...]
julia> arr[Variable=Where(contains("moisture"))]
10×10×2 DimArray{Float64,3} with dimensions: 
  X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
  Y Sampled{Int64} 1:10 ForwardOrdered Regular Points,
  Dim{:Variable} Categorical{String} String["root_moisture", "soil_moisture"] Unordered
[:, :, 1]
     1          2          3         4         5         6         7           8          9         10
  1  0.0451999  0.721679   0.472552  0.172361  0.838639  0.748815  0.00979697  0.0228791  0.312279   0.254207
  ⋮                                            ⋮                                                     ⋮
 10  0.0962899  0.0916193  0.856692  0.725752  0.530497  0.891864  0.307378    0.40408    0.429365   0.0391044
[and 1 more slices...]
julia> arr[Variable=Contains("moisture")] # This fails
ERROR: ArgumentError: moisture not found in ["root_moisture", "soil_moisture", "air_temperature", "something"]
Stacktrace:
rafaqz commented 1 year ago

Contains means an interval containing a point. Having this run contains on strings could be confusing? Probably just using Where is better

felixcremer commented 1 year ago

Then the Docstring should state, that one should use Where for this case and should not silently try to use At and rather throw an informative error message.

I think, that the overlap with the Intervals is not so bad because Categorical and Intervals doing different things is expected and Contains has a clear meaning for Strings.

rafaqz commented 1 year ago

Let make the docstring clearer