rafaqz / DimensionalData.jl

Named dimensions and indexing for julia arrays and other data
https://rafaqz.github.io/DimensionalData.jl/stable/
MIT License
262 stars 38 forks source link

DimensionMismatch with `cat` #672

Open bjarthur opened 3 months ago

bjarthur commented 3 months ago

sorry about another cat issue, but shouldn't it work even if the dimensions are transposed in the underlying array?

julia> using DimensionalData

julia> A = rand(X(2), Y(3))
╭─────────────────────────╮
│ 2×3 DimArray{Float64,2} │
├─────────────────── dims ┤
  ↓ X, → Y
└─────────────────────────┘
 0.638864  0.541596  0.625792
 0.885387  0.24923   0.979457

julia> B = rand(Y(3), X(4))
╭─────────────────────────╮
│ 3×4 DimArray{Float64,2} │
├─────────────────── dims ┤
  ↓ Y, → X
└─────────────────────────┘
 0.384553  0.65434   0.108298  0.531854
 0.909667  0.847332  0.319774  0.492833
 0.653972  0.868346  0.567731  0.408845

julia> cat(A, B, dims=:X)
ERROR: DimensionMismatch: mismatch in dimension 2 (expected 3 got 4)
Stacktrace:
  [1] _cs
    @ ./abstractarray.jl:1785 [inlined]
  [2] _cshp
    @ ./abstractarray.jl:1775 [inlined]
  [3] _cshp(ndim::Int64, dims::Tuple{Bool}, shape::Tuple{Int64, Int64}, nshape::Tuple{Int64, Int64})
    @ Base ./abstractarray.jl:1782
  [4] _cat_size_shape
    @ ./abstractarray.jl:1761 [inlined]
  [5] cat_size_shape
    @ ./abstractarray.jl:1759 [inlined]
  [6] _cat_t
    @ ./abstractarray.jl:1800 [inlined]
  [7] _cat(dims::Tuple{Int64}, X1::Matrix{Float64}, X::Matrix{Float64})
    @ SparseArrays ~/.julia/juliaup/julia-1.10.2+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/SparseArrays/src/sparsevector.jl:1227
  [8] cat
    @ ./abstractarray.jl:1993 [inlined]
  [9] _cat(catdims::Tuple{…}, A1::DimMatrix{…}, As::DimMatrix{…})
    @ DimensionalData ~/.julia/packages/DimensionalData/vXseP/src/array/methods.jl:335
 [10] _cat
    @ ~/.julia/packages/DimensionalData/vXseP/src/array/methods.jl:269 [inlined]
 [11] #cat#196
    @ ~/.julia/packages/DimensionalData/vXseP/src/array/methods.jl:266 [inlined]
 [12] top-level scope
    @ REPL[5]:1
Some type information was truncated. Use `show(err)` to see complete types.

julia> cat(A, B', dims=:X)
╭─────────────────────────╮
│ 6×3 DimArray{Float64,2} │
├─────────────────── dims ┤
  ↓ X, → Y
└─────────────────────────┘
 0.638864  0.541596  0.625792
 0.885387  0.24923   0.979457
 0.384553  0.909667  0.653972
 0.65434   0.847332  0.868346
 0.108298  0.319774  0.567731
 0.531854  0.492833  0.408845

julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 12 × Apple M2 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
  JULIA_PROJECT = @.
  JULIA_EDITOR = vi

(jl_ZgjCA7) pkg> st
Status `/private/var/folders/s5/8d629n5d7nsf37f60_91wzr40000gq/T/jl_ZgjCA7/Project.toml`
  [0703355e] DimensionalData v0.26.3
rafaqz commented 3 months ago

That would really be moving away from base julia behavior.

We actually never automatically permute or do anything fancy like that on Base methods so we don't accidentally let through bugs in generic contexts. Like a permuted square DimMatrix would work in cat like its parent Array but the answer will be different because of the auto permutation.

For broadcasts we have our own broadcast_dims that does permute and even reshapes missing dimensions. Someone could write a PR for something similar for cat.

rafaqz commented 3 months ago

Can you just permute all your arrays to a particular dim order first?

bjarthur commented 3 months ago

i can just permute and that is what i've just done to fix a bug in my code. but, in my mind one of the nice things about DimArrays is you don't (usually at least i'm just now learning b/c of cat) have to pay attention to the order of the underlying dims. so for example in the code above i can reverse the indexing and it doesn't matter: A[Y(1),X(2)] == A[X(2),Y(1)], where as in base julia the analogous equality, A[1,2] == A[2,1], would be false.

rafaqz commented 3 months ago

I guess in this case using the named dimension could move us away from the base behaviour as youre already using dims=:X and that will fail on a Array. If you use dims=1 we cant permute.

But what should it do if you use hcat ?

sethaxen commented 3 months ago

I agree it makes sense to keep base behavior here. It seems like a macro that overrides base behavior and uses dimensions would be useful. I guess a generalization of the current macro for broadcasting.

bjarthur commented 3 months ago

i like the idea that with dims=:X it does the permute for you and with dims=1 it throws an error if the sizes of the DimArrays don't match. for hcat, i'd suggest that it should behave like cat(..., dims=1) and throw an error.

i've got 2400 lines of code in which i assumed when i wrote it that the order of named dims never mattered. are there any other cases like cat for which it actually does?

felixcremer commented 3 months ago

According to the docs in Base hcat is equivalent to cat(A...; dims=2) and vcat to cat(A...; dims=1) and therefore I would expect this to error.

rafaqz commented 3 months ago

@bjarthur basically the dimension order doesnt matter for indexing if you use dimensions.

Where else besides cat do we want auto permutation if you name the dimension? I guess stack .

~And like all functiond that call reduce over a named dimension with multiple DimArrays ?~ oops thats only one array

And I guess the first argument gets to be the one that doesnt permute?

What if a regular Array is included in cat with two DimArrays that need permuting?

felixcremer commented 3 months ago

Where else besides cat do we want auto permutation if you name the dimension? I guess stack .

Yes stack seems to be a valid one. I just went through the Base Array function docs and looked for functions that accepted a dims keyword. There was no other function that came to mind where the auto permutation seems relevant. https://docs.julialang.org/en/v1/base/arrays/

What if a regular Array is included in cat with two DimArrays that need permuting?

I think, that should error, because it is not clear what is the intended behaviour. But do you think that this is a real use case? Normally I am having a list of DimArrays that I want to concatenate together. And as a fix for that one can wrap the normal Array into a DimArray but then the user can exactly specify which dimension has which name.

rafaqz commented 3 months ago

Yes you're right @felixcremer it seem its only cat and stack where it makes sense to do this. Currently its an error so we can add it at any time and I'm ok with auto permutation as long as the syntax errors on a regular array.

@sethaxen in the long term a macro for broadcast would be nice I was thinking @d A * B + 7 could wrap all of its arguments in a function. That would wrap just AbstractDimArray and AbstractDimStack in a wrapper type to indicate auto permuted/reshaped broadcasts are ok, and then replace itself with @. to create regular broadcasts. Then we can basically do what broadcast_dims does behind the scenes during broadcasting dispatch.

Its also no more characters than regular @. macro broadcasts, which is nice.