Open BenWibking opened 10 months ago
@pakshingli I wanted to make sure you were aware of this GitHub issue. We will need to use the amrex::NeighborParticleContainer
for the sink particles for the reasons described above.
I agree that it is better not to use ORION2 implementation of sink particles that the scaling is not good for larger number of particles. The proposed approach makes sense.
PS
I agree that it is better not to use ORION2 implementation of sink particles that the scaling is not good for larger number of particles. The proposed approach makes sense.
PS
Ok, agreed. Are you planning to implement this?
Yes. I'll see how it works.
@pakshingli After thinking about this more, I think it is not worth it to do a parallel friends-of-friends group finding for particles. The tricky part is the subgroup merging, which requires communication of a complicated graph structure between neighboring grids.
For sink particle merging only, I propose that we broadcast the particles to every rank, copy the particles to CPU memory, do the friends-of-friends group finding using the existing ORION2 code, copy the results back to the GPU, then delete any linked particles and add new particles at the group center-of-mass. (In the future, this could be replaced with a distributed-memory parallel FOF group finder.)
@markkrumholz Since you wrote the original FOF code, does this plan make sense to you?
The rest of the sink particle operations (including accretion) can be handled using the AMReX particle infrastructure, since they are entirely local (as long as we grow a MultiFab such that nghost equals twice the radius of the accretion zone; this is fine since the blocking factor has to be at least 32 to run efficiently on GPUs, and it only has to be done for the finest-level MultiFab).
@markkrumholz and I discussed yesterday on the telecon an alternative to merging sink particles, which is to allow them to accrete in overlapping accretion zones, but in order of decreasing mass.
However, this still requires friends-of-friends to determine the data dependency between sink particles, since particles must accrete in order of mass within each friends-of-friends group. Otherwise, the entire accretion operation must be serialized, in which case we might as well give up and just replicate all sink particles on all ranks.
For reference, I've added to the issue description a link to the code in the BinaryOrbitCIC test problem that copies all of the particles to rank 0, and then (on rank 0 only) copies those particles to CPU memory. Then the FOF finder can be run on them.
@BenWibking: I do not think FoF is required provided that the sink accretion step is implemented in the right way and you have enough ghost zones. This is a little complicated, but here is what I have in mind:
First some definitions: let $\mathbf{Q}^{ijk,(n)}$ represent the state vector in cell $ijk$ after the hydro step but before the sink solve, and let $\Delta\mathbf{Q}^{ijk,s}$ be the change in the state quantity due to accretion onto sink particle $s$ before any limiting is applied, i.e., before potentially reducing the change to avoid things like driving the density below the floor. Finally, let $\mathbf{P}^{s,(n)}$ be the set of properties (mass, velocity, etc.) of sink particle $s$ prior to accretion, and let $\Delta \mathbf{P}^s$ be the change in those properties due to accretion; the value of $\Delta \mathbf{P}^s$ is determined by the corresponding $\Delta\mathbf{Q}^{ijk,s}$ values, so as to enforce conservation (e.g., mass gained by the sink = mass lost by all cells).
Now suppose we have a sink particle accretion algorithm with the following properties. First, for a sink particle $s$ residing in cell $ijk$, the change $\Delta\mathbf{Q}^{ijk,s}$ is guaranteed to be non-zero only over a range of cells from $i - N\mathrm{acc}$ to $i + N\mathrm{acc}$ and similarly in the $j$ and $k$ directions. Second, $\Delta\mathbf{Q}^{ijk,s}$ depends only on $P^{(s)}$ and on $\mathbf{Q}^{ijk,(n)}$ in cells from $i - N\mathrm{dep}$ to $i + N\mathrm{dep}$ and similarly in $j$ and $k$. That is, given the state of the gas prior to the sink accretion step, you can compute the accretion rate onto each sink particle just using information in a stencil $N\mathrm{dep}$ cells wide around it. For ORION2 we have $N\mathrm{acc} = N\mathrm{dep} = 2$, but we do not need to use exactly this algorithm, so I'm going to just leave these as free parameters for now, just subject to the requirement that $N\mathrm{dep} \leq N_\mathrm{acc}$, i.e., the accretion rate depends only on cells inside the accretion zone.
The reason sink accretion is not independent is due to limiting. The value $\Delta\mathbf{Q}^{ijk,s}$ is an initial estimate of the change in any given cell's properties, but it is necessary to limit $\Delta\mathbf{Q}^{ijk,s}$ to, for example, avoid generating negative densities if multiple sinks try to take mass out of the same cell; let us call the change in cell properties after limiting is applied $\Delta\mathbf{Q}^{ijk,s,\mathrm{lim}}$. Limits are local -- that is, the difference between $\Delta\mathbf{Q}^{ijk,s}$ and $\Delta\mathbf{Q}^{ijk,s,\mathrm{lim}}$ depends directly only on the properties of cell $ijk$. The tricky part is that the limiting procedure, while it is local, depends on the local state that exists in cell $ijk$ after higher-priority sinks have already accreted material; how priority is determined is not important for our purposes, as long as we have some algorithm to determine it. To put this symbolically, if we order the sinks affecting a given cell $s=1,2,3,\ldots$, with $s = 1$ representing the highest priority, then $\Delta\mathbf{Q}^{ijk,s,\mathrm{lim}}$ depends on $\mathrm{P}^s$, $\mathbf{Q}^{ijk,(n)}$, and $\Delta\mathbf{Q}^{ijk,s',\mathrm{lim}}$ for all $s' < s$. In order to enforce conservation, the final amount accreted by sink particle $s$ after limiting is applied, $\Delta \mathbf{P}^{s,\mathrm{lim}}$, also depends on all $s' < s$.
Given this situation, now let us consider what is required to do the two things we have to do for a given grid: (1) compute the final change in properties $\Delta\mathbf{Q}_{ijk}^{s,\mathrm{lim}}$ for cell real, non-ghost cell in the grid, and (2) compute the corresponding changes $\Delta \mathbf{P}^s$ for every sink particle that is contained within the non-ghost zones of the grid. We can do this via the following algorithm:
(1) First, exchange information so that each grid has access to the cell information $\mathbf{Q}^{ijk,(n)}$ out to a distance of $N\mathrm{ghost} = 2 N\mathrm{acc} + N\mathrm{dep}$ cells around it, and the list of all sink particle properties $\mathbf{P}^s$ out to a distance $N\mathrm{ghost,sink} = 2 N_\mathrm{acc}$ around it. The reason for these numbers of ghost zones and ghost particles being required will become clear in a moment.
(2) Next compute $\Delta\mathbf{Q}^{ijk,s}$ and $\Delta \mathbf{P}^s$ for $s = 1$, the highest priority sink particle on the grid. Since these properties depend on $\Delta\mathbf{Q}^{ijk,s=1}$ for cell states $\mathbf{Q}^{ijk,(n)}$ over a stencil of width $N\mathrm{dep}$, and the only sink particles we have lie at least $N\mathrm{dep}$ cells from the boundary of the ghost zones (since $N\mathrm{ghost} - N\mathrm{ghost,sink} = N_\mathrm{dep}$, we are guaranteed that all the required information is accessible.
(3) Next apply limiting to compute $\Delta\mathbf{Q}^{ijk,s,\mathrm{lim}}$ and $\Delta \mathbf{P}^{s,\mathrm{lim}}$. After limiting, apply the change $\Delta\mathbf{Q}^{ijk,s,\mathrm{lim}}$ to all real and ghost zones, and apply the change $\Delta \mathbf{P}^{s,\mathrm{lim}}$ to the sink particle if it lies within the real zones. (You can apply it to sinks that live in ghost zones as well, but this will have no effect, because these are temporary particles that will be discarded at the end.)
(4) Now note that, because we have access to all sink particles out to $N\mathrm{ghost,sink} = 2 N\mathrm{acc}$ ghost zones around the grid, we have by this procedure correctly applied changes to all the ghost zones out to a distance of $N_\mathrm{acc}$ from the grid, and these are all the zones where limiting can matter to sink particles inside the grid. That is, if $i_0$ is the lowest $i$ index in a given grid, then a sink particle in cell $i_0$ accretes from zones with indices as small as $i0 - N\mathrm{acc}$, and thus we need to know if the properties of these zones have been modified by accretion by higher-priority sinks -- but these higher priority sinks can lie at the furthest in cell $i - 2 N\mathrm{acc}$, and we have correctly computed the accretion rates onto all those sinks, because they lie within $N\mathrm{ghost,sink}$ of the grid. We can therefore proceed to sink $s = 2$, and so forth, with confidence that we are correctly accounting for limiting.
Thus the only change compared to the ORION2 algorithm where accretion zones never overlap is that the number of ghost zones required goes up from $2 N\mathrm{acc}$ to $2 N\mathrm{acc} + N\mathrm{dep}$. If we just port the ORION2 algorithm directly, we need 6 ghost zones (but only for grids that contain sinks), but we could also probably get away with using $N\mathrm{dep} = 1$, which will be slightly less accurate but only for a limited range of sink particle properties (basically only in the case where the Bondi radius of a particle is of order the cell size), in which case we woudl only need 5 ghost zones. Given that we already need 4 ghost zones for PPM hydro, this is not a big hit.
To put this symbolically, if we order the sinks affecting a given cell s=1,2,3,…, with s=1 representing the highest priority, then ΔQijk,s,lim depends on Ps, Qijk,(n), and ΔQijk,s′,lim for all s′<s. In order to enforce conservation, the final amount accreted by sink particle s after limiting is applied, ΔPs,lim, also depends on all s′<s.
How does this avoid data dependencies in the case where you have two grids, with three sink particles, but where grid 0 sees particles 0 and 1 (where 1 has priority over 0), and grid 1 (adjacent to grid 0) sees particles 2 and 1, where particle 2 has priority over particle 1?
In this case, grid 0 will compute the accretion for particle 1 and then particle 0, while Grid 1 computes the accretion for particle 2 and then particle 1. After doing particle 2's accretion, we know $\Delta P^{2,\text{lim}}$, then can compute $\Delta P^{1,\text{lim}}$. It is unclear to me what the second sentence quoted above means. Does $\Delta P^{0,\text{lim}}$ not depend on $\Delta P^{1,\text{lim}}$, but instead only on $P^{1,(n)}$?
In your example, it is indeed the case that $\Delta P^{0,\mathrm{lim}}$ does not directly depend on $\Delta P^{1,\mathrm{lim}}$, and indeed $\Delta P^{0,\mathrm{lim}}$ does not depend directly on any other $\Delta P^{s,\mathrm{lim}}$ -- it only depends on the values of the cells that contribute to $\Delta P^{s,\mathrm{lim}}$ and that are also within sink particle 0's accretion zone. So as long as we can compute the updated states of all of those cells correctly (which we can, because we guarantee that any sink particle that can affect anything within particle 0's accretion zone is stored locally), it doesn't matter if we can't compute $\Delta P^{s,\mathrm{lim}}$ correctly on processor 0.
More generally, if a sink particle lives on a given grid, we need to be able to guarantee that we can correctly update the properties of any cell that is potentially inside its accretion zone and which could be affected by accretion by an higher-priority sink. Those sinks might live on other grids, which means that we need to know about all sinks whose accretion zones potentially overlap with the accretion zones of sinks on our grid, and we need to have enough ghost zones around those sinks that we can correctly compute their accretion rates -- but the trick is that we don't need to be able to calculate those ghost sinks' accretion rates after limiting, only before limiting, because the values of the cells we care about are only sensitive to the values of cells at the start of the time step and to the surrounding sinks' accretion rates before they are limited.
Ok, I understand it now.
Also, I think it should be possible to re-use the machinery for AMR grid generation to exactly find the center-of-mass of connected Jeans-unstable regions.
Unfortunately, I think I don't have time to work more on sink particles in the immediate future, so I will leave this to someone else to implement.
Describe the proposal Building on the particle infrastructure in https://github.com/quokka-astro/quokka/issues/491, it should be possible to add the capability for these particles to accrete mass (and angular momentum) following Krumholz, McKee & Klein (2004).
The accretion zone can be handled by iterating over grids with the
AmrParticleContainer
, as long as the number of ghost cells is set to twice the accretion radius, such that the accretion can be done independently on each grid.The friends-of-friends group finding probably has to be done on a single rank for now. However, it is easy to copy all of the sink particles to rank 0 and then copy to CPU memory (see below).
Describe alternatives you've considered The main alternative would be to directly copy the implementation from ORION2, which requires all particles to be replicated on all MPI ranks. This would not scale to large numbers of particles, and would not allow us to take advantage of the AMReX particle infrastructure.
Additional context The ORION2 implementation of the accretion zones is here.
An example of how to copy all the particles in a ParticleContainer to rank 0 and then copy to CPU memory is given in the
BinaryOrbitCIC
example code: https://github.com/quokka-astro/quokka/blob/df7854af6ebacab2e86b9af1fe5b7eaa27730c0b/src/BinaryOrbitCIC/binary_orbit.cpp#L106Tracking PRs: