jipolanco / PencilArrays.jl

Distributed Julia arrays using the MPI protocol
https://jipolanco.github.io/PencilArrays.jl/dev/
MIT License
60 stars 8 forks source link

Constraints on setting the grid size and ranks #49

Closed tomchor closed 2 years ago

tomchor commented 2 years ago

I've been trying to run this code, which uses PencilArrays in the background to do the decomposition.

Depending on the value I put for the grid sizes Nx, Ny, Nz and the number of ranks in each direction Rx, Ry, Rz I get an error. For example I noticed that when Nx*Rx == Ny*Ry == Nz*Rz the code runs successfully. But if that condition isn't satisfied (for example if I set Nx=Ny=Nz=8, Rx=Rz=1 and Ry=2) I get an error like this:

ERROR: ERROR: LoadError: LoadError: DimensionMismatch(DimensionMismatch("arrays could not be broadcast to a common size; got a dimension wi"arrays could not be broadcast to a common size; got a dimension with lengths 8 andth lengths 8 and 4") 4")
Stacktrace:
  [1] 
Stacktrace:
  [1] _bcs1_bcs1
    @ ./
    @ ./broadcast.jl:broadcast.jl:501 [inlined]
501 [inlined]
  [2]   [2] _bcs(_bcs(shape::shape::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}, newshape::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}, newshape::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}})
    @ Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}})
    @ Base.Broadcast ./broadcast.jl:Base.Broadcast ./broadcast.jl:495
  [3]495
  [3] broadcast_shape
    @  broadcast_shape
    @ ././broadcast.jl:broadcast.jl:489489 [inlined]
 [inlined]
  [4]  [4] combine_axes combine_axes
    @ 
    @ ././broadcast.jl:broadcast.jl:484 [inlined]484 [inlined]

  [5]   [5] _axes_axes
    @ ./
    @ ./broadcast.jl:209broadcast.jl:209 [inlined]
  [6] [inlined]
  [6] axes
    @  axes
    @ ././broadcast.jl:207broadcast.jl:207 [inlined]
 [inlined]
  [7]   [7] _unwrap_pa(bc::_unwrap_pa(bc::Base.Broadcast.Broadcasted{PencilArrays.PencilArrayStyle{3}, Nothing, typeof(/), Tuple{Base.Broadcast.Broadcasted{PencilArrays.PencilArrayStyle{3}, Nothing, typeof(-), Tuple{PencilArrays.PencilArrayBroadcastable{ComplexF64, 3, PencilArrays.PencilArray{ComplexF64, 3, Base.ReshapedArray{ComplexF64, 3, SubArray{ComplexF64, 1, Vector{ComplexF64}, Tuple{Base.OneTo{Int64}}, true}, Tuple{}}, 3, 0, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.Permutation{(2, 1, 3), 3}, Vector{UInt8}}}}}}, Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{3}, Nothing, typeof(+), Tuple{Array{Float64, 3}, Array{Float64, 3}, Array{Float64, 3}}}}})
    @ PencilArraysBase.Broadcast.Broadcasted{PencilArrays.PencilArrayStyle{3}, Nothing, typeof(/), Tuple{Base.Broadcast.Broadcasted{PencilArrays.PencilArrayStyle{3}, Nothing, typeof(-), Tuple{PencilArrays.PencilArrayBroadcastable{ComplexF64, 3, PencilArrays.PencilArray{ComplexF64, 3, Base.ReshapedArray{ComplexF64, 3, SubArray{ComplexF64, 1, Vector{ComplexF64}, Tuple{Base.OneTo{Int64}}, true}, Tuple{}}, 3, 0, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.Permutation{(2, 1, 3), 3}, Vector{UInt8}}}}}}, Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{3}, Nothing, typeof(+), Tuple{Array{Float64, 3}, Array{Float64, 3}, Array{Float64, 3}}}}})
    @ PencilArrays ~/.julia/packages/PencilArrays/Jy1Lp/src/broadcast.jl:51
 ~/.julia/packages/PencilArrays/Jy1Lp/src/broadcast.jl:51
  [8]   [8] materialize!(u::materialize!(u::PencilArrays.PencilArray{ComplexF64, 3, Base.ReshapedArray{ComplexF64, 3, SubArray{ComplexF64, 1, Vector{ComplexF64}, Tuple{Base.OneTo{Int64}}, true}, Tuple{}}, 3, 0, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.Permutation{(2, 1, 3), 3}, Vector{UInt8}}}, bc_in::PencilArrays.PencilArray{ComplexF64, 3, Base.ReshapedArray{ComplexF64, 3, SubArray{ComplexF64, 1, Vector{ComplexF64}, Tuple{Base.OneTo{Int64}}, true}, Tuple{}}, 3, 0, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.Permutation{(2, 1, 3), 3}, Vector{UInt8}}}, bc_in::Base.Broadcast.Broadcasted{PencilArrays.PencilArrayStyle{3}, Nothing, typeof(/), Tuple{Base.Broadcast.Broadcasted{PencilArrays.PencilArrayStyle{3}, Nothing, typeof(-), Tuple{PencilArrays.PencilArrayBroadcastable{ComplexF64, 3, PencilArrays.PencilArray{ComplexF64, 3, Base.ReshapedArray{ComplexF64, 3, SubArray{ComplexF64, 1, Vector{ComplexF64}, Tuple{Base.OneTo{Int64}}, true}, Tuple{}}, 3, 0, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.Permutation{(2, 1, 3), 3}, Vector{UInt8}}}}}}, Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{3}, Nothing, typeof(+), Tuple{Array{Float64, 3}, Array{Float64, 3}, Array{Float64, 3}}}}})
    @ PencilArrays ~/.julia/packages/PencilArrays/Jy1Lp/src/broadcast.jl:33Base.Broadcast.Broadcasted
{PencilArrays.PencilArrayStyle{3}, Nothing, typeof(/), Tuple{Base.Broadcast.Broadcasted{PencilArrays.PencilArrayStyle{3}, Nothing, typeof(-), Tuple{PencilArrays.PencilArrayBroadcastable{ComplexF64, 3, PencilArrays.PencilArray{ComplexF64, 3, Base.ReshapedArray{ComplexF64, 3, SubArray{ComplexF64, 1, Vector{ComplexF64}, Tuple{Base.OneTo{Int64}}, true}, Tuple{}}, 3, 0, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.Permutation{(2, 1, 3), 3}, Vector{UInt8}}}}}}, Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{3}, Nothing, typeof(+), Tuple{Array{Float64, 3}, Array{Float64, 3}, Array{Float64, 3}}}}})  [9]
    @ PencilArrays  solve!(x::~/.julia/packages/PencilArrays/Jy1Lp/src/broadcast.jl:33
  [9] solve!(x::Field{Center, Center, Center, Nothing, RectilinearGrid{Float64, Periodic, Periodic, Periodic, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, MultiArch{CPU, Int64, Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, Oceananigans.Distributed.RankConnectivity{Nothing, Nothing, Int64, Int64, Nothing, Nothing}, MPI.Comm}}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, Array{Float64, 3}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{HaloCommunication, Oceananigans.Distributed.HaloCommunicationRanks{Int64, Int64}}, BoundaryCondition{HaloCommunication, Oceananigans.Distributed.HaloCommunicationRanks{Int64, Int64}}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing}, solver::Field{Center, Center, Center, Nothing, RectilinearGrid{Float64, Periodic, Periodic, Periodic, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, MultiArch{CPU, Int64, Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, Oceananigans.Distributed.RankConnectivity{Nothing, Nothing, Int64, Int64, Nothing, Nothing}, MPI.Comm}}, Tuple{Colon, Colon, Colon}, OffsetArrays.OffsetArray{Float64, 3, Array{Float64, 3}}, Float64, FieldBoundaryConditions{BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{HaloCommunication, Oceananigans.Distributed.HaloCommunicationRanks{Int64, Int64}}, BoundaryCondition{HaloCommunication, Oceananigans.Distributed.HaloCommunicationRanks{Int64, Int64}}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Periodic, Nothing}, BoundaryCondition{Oceananigans.BoundaryConditions.Flux, Nothing}}, Nothing}, solver::DistributedFFTBasedPoissonSolver{PencilFFTs.PencilFFTPlan{ComplexF64, 3, true, 3, 2, 0, PencilFFTs.GlobalFFTParams{Float64, 3, true, Tuple{PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!}}, Tuple{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.NoPermutation, Vector{UInt8}}, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.NoPermutation, Vector{UInt8}}, PencilFFTs.Transforms.FFT!, FFTW.cFFTWPlan{ComplexF64, -1, true, 3, Int64}, FFTW.cFFTWPlan{ComplexF64, 1, true, 3, Int64}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.Permutation{(2, 1, 3), 3}, Vector{UInt8}}, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.Permutation{(2, 1, 3), 3}, Vector{UInt8}}, PencilFFTs.Transforms.FFT!, FFTW.cFFTWPlan{ComplexF64, -1, true, 3, Int64}, FFTW.cFFTWPlan{ComplexF64, 1, true, 3, Int64}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.Permutation{(3, 2, 1), 3}, Vector{UInt8}}, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.Permutation{(3, 2, 1), 3}, Vector{UInt8}}, PencilFFTs.Transforms.FFT!, FFTW.cFFTWPlan{ComplexF64, -1, true, 3, Int64}, FFTW.cFFTWPlan{ComplexF64, 1, true, 3, Int64}}}, PencilArrays.Transpositions.PointToPoint}, RectilinearGrid{Float64, Periodic, Periodic, Periodic, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, CPU}, RectilinearGrid{Float64, Periodic, Periodic, Periodic, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, MultiArch{CPU, Int64, Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, Oceananigans.Distributed.RankConnectivity{Nothing, Nothing, Int64, Int64, Nothing, Nothing}, MPI.Comm}}, NamedTuple{(:λx, :λy, :λz), Tuple{Array{Float64, 3}, Array{Float64, 3}, Array{Float64, 3}}}, PencilArrays.ManyPencilArray{ComplexF64, 3, 3, Tuple{PencilArrays.PencilArray{ComplexF64, 3, Base.ReshapedArray{ComplexF64, 3, SubArray{ComplexF64, 1, Vector{ComplexF64}, Tuple{Base.OneTo{Int64}}, true}, Tuple{}}, 3, 0, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.NoPermutation, Vector{UInt8}}}, PencilArrays.PencilArray{ComplexF64, 3, Base.ReshapedArray{ComplexF64, 3, SubArray{ComplexF64, 1, Vector{ComplexF64}, Tuple{Base.OneTo{Int64}}, true}, Tuple{}}, 3, 0, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.Permutation{(2, 1, 3), 3}, Vector{UInt8}}}, PencilArrays.PencilArray{ComplexF64, 3, Base.ReshapedArray{ComplexF64, 3, SubArray{ComplexF64, 1, Vector{ComplexF64}, Tuple{Base.OneTo{Int64}}, true}, Tuple{}}, 3, 0, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.Permutation{(3, 2, 1), 3}, Vector{UInt8}}}}, Vector{ComplexF64}}})
    @ Oceananigans.Distributed ~/.julia/packages/Oceananigans/nqpMM/src/Distributed/distributed_fft_based_poisson_solver.jl:65
 [10] solve_for_pressure!(pressure::DistributedFFTBasedPoissonSolver{PencilFFTs.PencilFFTPlan{ComplexF64, 3, true, 3, 2, 0, PencilFFTs.GlobalFFTParams{Float64, 3, true, Tuple{PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!}}, Tuple{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.NoPermutation, Vector{UInt8}}, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.NoPermutation, Vector{UInt8}}, PencilFFTs.Transforms.FFT!, FFTW.cFFTWPlan{ComplexF64, -1, true, 3, Int64}, FFTW.cFFTWPlan{ComplexF64, 1, true, 3, Int64}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.Permutation{(2, 1, 3), 3}, Vector{UInt8}}, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.Permutation{(2, 1, 3), 3}, Vector{UInt8}}, PencilFFTs.Transforms.FFT!, FFTW.cFFTWPlan{ComplexF64, -1, true, 3, Int64}, FFTW.cFFTWPlan{ComplexF64, 1, true, 3, Int64}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.Permutation{(3, 2, 1), 3}, Vector{UInt8}}, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.Permutation{(3, 2, 1), 3}, Vector{UInt8}}, PencilFFTs.Transforms.FFT!, FFTW.cFFTWPlan{ComplexF64, -1, true, 3, Int64}, FFTW.cFFTWPlan{ComplexF64, 1, true, 3, Int64}}}, PencilArrays.Transpositions.PointToPoint}, RectilinearGrid{Float64, Periodic, Periodic, Periodic, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, CPU}, RectilinearGrid{Float64, Periodic, Periodic, Periodic, Float64, Float64, Float64, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, MultiArch{CPU, Int64, Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64, Int64}, Oceananigans.Distributed.RankConnectivity{Nothing, Nothing, Int64, Int64, Nothing, Nothing}, MPI.Comm}}, NamedTuple{(:λx, :λy, :λz), Tuple{Array{Float64, 3}, Array{Float64, 3}, Array{Float64, 3}}}, PencilArrays.ManyPencilArray{ComplexF64, 3, 3, Tuple{PencilArrays.PencilArray{ComplexF64, 3, Base.ReshapedArray{ComplexF64, 3, SubArray{ComplexF64, 1, Vector{ComplexF64}, Tuple{Base.OneTo{Int64}}, true}, Tuple{}}, 3, 0, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.NoPermutation, Vector{UInt8}}}, PencilArrays.PencilArray{ComplexF64, 3, Base.ReshapedArray{ComplexF64, 3, SubArray{ComplexF64, 1, Vector{ComplexF64}, Tuple{Base.OneTo{Int64}}, true}, Tuple{}}, 3, 0, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.Permutation{(2, 1, 3), 3}, Vector{UInt8}}}, PencilArrays.PencilArray{ComplexF64, 3, Base.ReshapedArray{ComplexF64, 3, SubArray{ComplexF64, 1, Vector{ComplexF64}, Tuple{Base.OneTo{Int64}}, true}, Tuple{}}, 3, 0, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.Permutation{(3, 2, 1), 3}, Vector{UInt8}}}}, Vector{ComplexF64}}})
    @ Oceananigans.Distributed ~/.julia/packages/Oceananigans/nqpMM/src/Distributed/distributed_fft_based_poisson_solver.jl:65

... the error continues.

So, my question is: is that a basic constraint when setting up the pencils?

A broader point is that I wasn't able to find anything in the docs stating constraints when setting up the pencils. Is that information there? If not, maybe it's worth including a small section discussing it?

Thanks!

jipolanco commented 2 years ago

Hi, right now I can't look at this in detail, but the issue seems to come from a broadcasting operation involving arrays with different memory layouts (as implied by the different Permutation{...} and NoPermutation in the error message). Such permutations are implicitly applied when performing transforms using PencilFFTs.jl (but they can be explicitly disabled when creating a plan).

My guess is that you're broadcasting a Fourier-transformed 3D PencilArray with a regular 3D Array. In that case, the issue is related to #42, see https://github.com/jipolanco/PencilArrays.jl/issues/42#issuecomment-1029766858 for details. Unfortunately this case is currently not well-supported in PencilArrays.

To fix this, the broadcasting implementation should be generalised to be able to combine arrays with different memory layouts. Currently, broadcasting assumes that all arrays have the same layout, and thus broadcasting is performed contiguously in memory. A possible fix would allow a slower path in which not all arrays are iterated contiguously (but this would require scalar indexing, not ideal on GPUs).

In any case, I don't think this is an issue with constraints on the number of processes or grid points (there are no such constraints that I'm aware of, other than obvious ones such as grid points >= ranks). I would guess that, in the cases where your code doesn't fail, it's just a coincidence (e.g. the local number of grid points is the same along all directions), but I would still expect the results to be wrong, since the memory layouts are different.

tomchor commented 2 years ago

Closing this as it was solved on our end!