This PR lifts the restriction that the decomp_dims argument to Pencil must be a list of sorted dimensions. For instance, decomp_dims = (1, 3) was allowed while (3, 1) was not.
In practice, this enables transpositions that weren't possible before. This is because, for two decompositions to be compatible for transpositions, their decomp_dims must differ by exactly one element (this is an algorithmic detail which basically simplifies the implementation and allows to minimise inter-process communications). So, decomp_dims = (2, 3) (data is local in x) has always been compatible with decomp_dims = (1, 3) (data is local in y). This PR enables the decomp_dims = (2, 1) (local in z) configuration which wasn't previously possible.
For example:
using MPI
using PencilArrays
using LinearAlgebra: transpose!
using Random
MPI.Init()
comm = MPI.COMM_WORLD
dims_global = (12, 14, 9)
# Create a 2D decomposition in which data is local in the `x` direction.
pen_x = Pencil(dims_global, (2, 3), comm)
ux = PencilArray{Float64}(undef, pen_x)
randn!(ux)
# Create another 2D decomposition in which data is local in `y`.
# (This has always been possible.)
pen_y = Pencil(pen_x; decomp_dims = (1, 3))
uy = PencilArray{Float64}(undef, pen_y)
transpose!(uy, ux) # transpose between the two configurations
# Create a 2D decomposition in which data is local in `z`.
# To transpose from an `x`-local to a `z`-local decomposition, it was
# previously required to go through the intermediate `y`-local configuration.
# With this PR we can skip that step and do the following:
pen_z = Pencil(pen_x; decomp_dims = (2, 1)) # the order of elements in decomp_dims is important!
uz = PencilArray{Float64}(undef, pen_z)
transpose!(uz, ux)
Note that it should be possible to automate the selection of a "good" order of decomp_dims leading to compatible decompositions, but this is not done for now.
This PR lifts the restriction that the
decomp_dims
argument toPencil
must be a list of sorted dimensions. For instance,decomp_dims = (1, 3)
was allowed while(3, 1)
was not.In practice, this enables transpositions that weren't possible before. This is because, for two decompositions to be compatible for transpositions, their
decomp_dims
must differ by exactly one element (this is an algorithmic detail which basically simplifies the implementation and allows to minimise inter-process communications). So,decomp_dims = (2, 3)
(data is local inx
) has always been compatible withdecomp_dims = (1, 3)
(data is local iny
). This PR enables thedecomp_dims = (2, 1)
(local inz
) configuration which wasn't previously possible.For example:
Note that it should be possible to automate the selection of a "good" order of
decomp_dims
leading to compatible decompositions, but this is not done for now.