rafaqz / DimensionalData.jl

Named dimensions and indexing for julia arrays and other data
https://rafaqz.github.io/DimensionalData.jl/stable/
MIT License
262 stars 38 forks source link

Accessing the dimension combinations #683

Closed filchristou closed 3 months ago

filchristou commented 3 months ago

I would like to show you how I am iterating over a DimArray over specific dimensions. Could you tell me if that's the planned way to do it? I have 2 approaches and I don't like both of them.

# initialize
using DimensionalData
using Random

nt = (:ena=>["alpha", "bita"], :dio=>["gamma", "delta"], :tria=>["epsilon", "zita", "ita"])
da = DimArray(rand(MersenneTwister(0), 2,2,3), nt)

# 1st way: iterate all dimenions using `nt`
withnt = [(k,v) for (k,vv) in nt for v in vv]

# 2nd way: iterate all dimenions with introspection
withda = [(typeof(d).parameters[1],v) for d in dims(da) for v in d]

# passes
@assert withnt == withda

@show withda
println()

# example why doing this
results = [(k,v,sum(da[Dim{k}(At(v))])) for (k,v) in withda]

outputs

withda = [(:ena, "alpha"), (:ena, "bita"), (:dio, "gamma"), (:dio, "delta"), (:tria, "epsilon"), (:tria, "zita"), (:tria, "ita")]

7-element Vector{Tuple{Symbol, String, Float64}}:
 (:ena, "alpha", 2.25703502389387)
 (:ena, "bita", 2.8719368864693973)
 (:dio, "gamma", 3.551405508439701)
 (:dio, "delta", 1.5775664019235667)
 (:tria, "epsilon", 2.0758986905037986)
 (:tria, "zita", 0.5929275888107981)
 (:tria, "ita", 2.460145631048671)

I don't like the first way, because I always need to carry around an extra data structure, even though that information is contained in the DimArray.

I don't like the second way, because it looks hacky. Doing typeof(::Dim).parameters[1] doesn't look like the planned API.

In the ending result I want to know which dimension corresponds to what value. Possibly even skip some dimensions, e.g.

filteredresult = [(k,v,sum(da[Dim{k}(At(v))])) for (k,v) in withda if length(v) == 5]

Is there a friendlier way to do this ?

lazarusA commented 3 months ago

Not sure I understand a 100%, but I would expect 12 cases? (2 x 2 x 3 ). Maybe the following helps:

using DimensionalData
nt = (:ena=>["alpha", "bita"], :dio=>["gamma", "delta"], :tria=>["epsilon", "zita", "ita"])
da = DimArray(rand(2,2,3), nt)

d_selec = DimSelectors(da);
for s in d_selec
    dims_vars_to_val = Pair.(name(s), getproperty.(val(s), :val)) => da[s]
    println(dims_vars_to_val)
end
(:ena => "alpha", :dio => "gamma", :tria => "epsilon") => 0.5396815200472261
(:ena => "bita", :dio => "gamma", :tria => "epsilon") => 0.06628104014306124
(:ena => "alpha", :dio => "delta", :tria => "epsilon") => 0.7060400463656702
(:ena => "bita", :dio => "delta", :tria => "epsilon") => 0.11927774983045725
(:ena => "alpha", :dio => "gamma", :tria => "zita") => 0.594558318353699
(:ena => "bita", :dio => "gamma", :tria => "zita") => 0.6228426138010711
(:ena => "alpha", :dio => "delta", :tria => "zita") => 0.25028247152664806
(:ena => "bita", :dio => "delta", :tria => "zita") => 0.8150549830768342
(:ena => "alpha", :dio => "gamma", :tria => "ita") => 0.7616173397185737
(:ena => "bita", :dio => "gamma", :tria => "ita") => 0.17792980505387812
(:ena => "alpha", :dio => "delta", :tria => "ita") => 0.38368566327225906
(:ena => "bita", :dio => "delta", :tria => "ita") => 0.3908464344631072
filchristou commented 3 months ago

I specifically want to index by a single dimension at the time to study the accumulated influence of the each dimension. How would you produce the same results like the one I mentioned with your way ?

I got that out, but again it looks a bit complex:


julia> withselector = vcat([unique(getindex.(DimSelectors(da), i)) for i in 1:length(dims(da))]...) #3rd way
7-element Vector{Dim{S, At{String, Nothing, Nothing}} where S}:
 ena At(alpha, nothing, nothing)
 ena At(bita, nothing, nothing)
 dio At(gamma, nothing, nothing)
 dio At(delta, nothing, nothing)
 tria At(epsilon, nothing, nothing)
 tria At(zita, nothing, nothing)
 tria At(ita, nothing, nothing)

julia> [(name(s), val(s).val, sum(da[s])) for s in withselector]
7-element Vector{Tuple{Symbol, String, Float64}}:
 (:ena, "alpha", 2.25703502389387)
 (:ena, "bita", 2.8719368864693973)
 (:dio, "gamma", 3.551405508439701)
 (:dio, "delta", 1.5775664019235667)
 (:tria, "epsilon", 2.0758986905037986)
 (:tria, "zita", 0.5929275888107981)
 (:tria, "ita", 2.460145631048671)
rafaqz commented 3 months ago

I can't really understand what you are trying to do.

To avoid an X/Y problem can you explain without reference to DimensionalData.jl what the input data is and what the expected outcome should be?

nt is redundant - that's what a dimension already is. A wrapper type wrapping the lookup vector where you could do d => lookup(d) if that was useful in this case (its probably not)

I would demonstrate more but vv is not actually defined in your MWE so I don't know what it is.

filchristou commented 3 months ago

Everything is defined in the MWE. vv is iterating the nt.

nt is redundant - that's what a dimension already

Happy to hear that you agree. So if we agree that the 1st way isn't the suggested one, that leaves me with either the 2nd way or with the 3rd way from two posts before.

The use case is that I have da and I want to produce results. For that I have 3 ways now:

1. This one uses the external NamedTuple nt, so you also said it should be needed.

withnt = [(k,v) for (k,vv) in nt for v in vv]

2. The second way

withda = [(typeof(d).parameters[1],v) for d in dims(da) for v in d]

3. And the third way with adaptation from Lazarus' answer

withselector = vcat([unique(getindex.(DimSelectors(da), i)) for i in 1:length(dims(da))]...)

My gripe is that the first is the simplest but relies on carrying around redundant information and 2 and 3 are complicated.

rafaqz commented 3 months ago
julia> results = [(name(d), l, sum(da[rebuild(d, At(l))])) for d in dims(da) for l in d]
7-element Vector{Tuple{Symbol, String, Float64}}:
 (:ena, "alpha", 2.25703502389387)
 (:ena, "bita", 2.8719368864693973)
 (:dio, "gamma", 3.551405508439701)
 (:dio, "delta", 1.5775664019235667)
 (:tria, "epsilon", 2.0758986905037986)
 (:tria, "zita", 0.5929275888107981)
 (:tria, "ita", 2.460145631048671)

Nothing redundant required.

DimSelectors pretty much does this, but for every combination of dims, where you want just each individual dim separately.

rebuild is the key method here for rewrapping At generically without knowing what d is.

rafaqz commented 3 months ago

I forgot you can use DimSelectors on the dimensions directly too, but its more verbose to get the name and val out as theyre wrapped and inside a Tuple:

julia> results = [(name(s[1]), val(val(s[1])), sum(da[s])) for ds in DimSelectors.(dims(da)) for s in ds]
7-element Vector{Tuple{Symbol, String, Float64}}:
 (:ena, "alpha", 2.25703502389387)
 (:ena, "bita", 2.8719368864693973)
 (:dio, "gamma", 3.551405508439701)
 (:dio, "delta", 1.5775664019235667)
 (:tria, "epsilon", 2.0758986905037986)
 (:tria, "zita", 0.5929275888107981)
 (:tria, "ita", 2.460145631048671)
rafaqz commented 3 months ago

Could you tell me if that's the planned way to do it?

To answer this, it was not something planned... I have never done this so it took a while to understand what you were going for. The intermediate withnt/withda step was the X/Y problem part I couldn't understand.

Anyway I think my first options is close to as clean as this could be.

rafaqz commented 3 months ago

Another way using map and broadcasts instead of the nested comprehension:

julia> map(dims(da)) do d
           tuple.((name(d),), d, sum.(getindex.((da,), DimSelectors(d))))
       end |> Iterators.flatten |> collect
7-element Vector{Tuple{Symbol, String, Float64}}:
 (:ena, "alpha", 2.25703502389387)
 (:ena, "bita", 2.8719368864693973)
 (:dio, "gamma", 3.551405508439701)
 (:dio, "delta", 1.5775664019235667)
 (:tria, "epsilon", 2.0758986905037986)
 (:tria, "zita", 0.5929275888107981)
 (:tria, "ita", 2.460145631048671)
filchristou commented 3 months ago

got it. okey, thanks a lot ! I close this issue now.