Open mrhardman opened 7 months ago
In order to reduce the number of operations in the applying gyroaverage matrix (which contains mostly zeros), I need to find a convenient way to restrict the summation indices in the following loop
for irp in 1:nr
for izp in 1:nz
gfield_out[ivperp,iz,ir,is] += gyromatrix[izp,irp,ivperp,iz,ir,is]*field_in[izp,irp]
end
end
In python I would do this by constructing a list of tuples [izp,irp]
which correspond to a nonzero point in the gyromatrix
, and then loop over all elements in the list. The list of tuples would be a different list for each (ivperp,iz,ir,is)
, requiring an array of lists of tuples. What is the Julia way of doing this operation?
I would consider using a compound index (this would just require an array of size nlist,nvperp,nz,nr,nspecies
) and a 1D view of the field, but note that this solution would also have to be applied to the pdf as in the block below, and also later be made compatible with distributed memory. Note that the data required for the sum is not necessarily contiguous in memory if periodic boundary conditions are used. The natural formulation is the element assembly, but we want to keep as much of the assembly to do with gyroangle integration in initialisation.
for irp in 1:nr
for izp in 1:nz
gpdf_out[ivpa,ivperp,iz,ir,is] += gyromatrix[izp,irp,ivperp,iz,ir,is]*pdf_in[ivpa,ivperp,izp,irp,is]
end
end
@johnomotani Do you have any suggestions?
This question seems to be partially address here by this thread discussing the construction of a vector of vectors of variable length: https://discourse.julialang.org/t/q-vector-of-vectors-of-variable-lengths/102817. By first looping over the gyromatrix
to find and count the nonzero elements, we can construct an array sizes
that has shape (nvperp,nz,nr,nspecies)
and the value of which is the number of nonzero (izp,irp)
points for that part of phase space. We can convert sizes
into an array of vectors via the constructor method
array_of_vecs = zeros.(Int64,sizes)
and then fill with the required index iz
or ir
with a second loop over gyromatrix
. Two of these arrays will be required for simplicity. Provided that we do not have to update these indices at run time, this method should efficiently reduce the cost of applying the gyroaverage operator.
The above described optimisation carried out on commit https://github.com/mabarnes/moment_kinetics/commit/9ad86b576385ec399dc3a8c9ab8bfa4c3a2214bd (tests passed locally tested up to 4 cores). For the problem size considered (see the examples posted in the PR, ngrid = 5, nelement = 4 for all dimensions except vperp, which has 2 elements), a factor of 1.5 speed up is observed on 2 cores. This speed up might be larger for larger problem sizes and smaller rhostar.
Unfortunately this optimisation introduces another problem, which is the large arrays the indexing https://github.com/mabarnes/moment_kinetics/blob/9ad86b576385ec399dc3a8c9ab8bfa4c3a2214bd/moment_kinetics/src/gyroaverages.jl#L24-L26 which are not shared by the cores in the shared-memory block. This quickly leads to problems when a large number of cores are used in a shared-memory region (effectively keeping an almost distribution function sized object for each core, even though they are the same for each core). The shared-memory formalism currently has no support for the type MPISharedArray{Array{mk_int,1},ndims}
, where Array{mk_int,1}
is intended to be a variable length array. This means that addressing this problem requires some thoughts about the shared-memory functions.
@johnomotani Is the shared-memory formalism generalisable to more abstract types (arrays of variable length?) Is there a better way to achieve my optimisation?
A less ambitious but potentially more successful method of indexing the summation over gyromatrix
is implemented https://github.com/mabarnes/moment_kinetics/pull/187/commits/d32f5f0cdbc68ac8b079b67b251cad70f8a15399. Now the arrays that store the indices are deliberately chosen to be too large, and only a subset of the information stored in them is actually assigned and used. This avoids having variable length arrays, whilst keeping the potential for a speedup but summing over a subset of the size of gyromatrix
.
The two uploaded file sets show the potential importance of the gyroaverages feature. Without numerical dissipation, the periodic 2D initial condition breaks up into fine structures that go unstable. This is still true with numerical dissipation, if it is not large enough. This means that any physics that would occur at Larmor scales may break the solutions, because it occurs at the grid scale of a drift-kinetic model. Previously we have observed similar problems with simulations with wall boundary conditions.
2D-periodic-2 crashes at time = 1.15 with r_dissipation_coefficient = 0.0 2D-periodic-2-diss crashes at time = 1.80 with r_dissipation_coefficient = 1.0e-4
2D-periodic-2-diss does not crash with r_dissipation_coefficient = 1.0e-2 2D-periodic-2-diss does not crash with r_dissipation_coefficient = 1.0e-1
Reopening issue to track progress (and make it easy to find).
Gyroaverages may be crucial for providing a physical cut-off to the wavenumber spectrum required to be resolved in the moment_kinetics model.
Initial experiments implementing an ion gyroaverage is in the branch https://github.com/mabarnes/moment_kinetics/tree/feature-gyroaverages.
To test the feature, run https://github.com/mabarnes/moment_kinetics/blob/feature-gyroaverages/test_scripts/gyroaverage_test.jl.
Here the Lagrange Polynomial representation is used to precompute integration weights as a function of z and r, in place of the integral in gyrophase, which is only a line integral in (z,r) space. The test plots the field
phi
and the gyroaveragegphi
for several values ofvperp
. Boundary conditions are not yet dealt with, meaning that the code will definitely produce poor results near the domain boundary. The code is written assuming fully local grids in memory.To do list: