communicate_plasma is undocumented

kslong commented 1 week ago

Communicate_plasma (and to a lessor extent the other communicate routines) is (are) quite complex, and need good documentation to allow anyone except @Edward-RSE or @jhmatthews to modify anything in the Plasma structure, especially with regard to understanding changes to things like buffer size.

An example of this is the lack of explanation for a line like:

  const int comm_buffer_size = calculate_comm_buffer_size (1 + n_cells_max * (1 + 20 + nphot_total + nions + NXBANDS + 2 * N_PHOT_PROC),
                                                           n_cells_max * (71 + 11 * nions + nlte_levels + 2 * nphot_total + n_inner_tot +
                                                                          11 * NXBANDS + NBINS_IN_CELL_SPEC + 6 * NFLUX_ANGLES +
                                                                          N_DMO_DT_DIRECTIONS + 12 * NFORCE_DIRECTIONS));

where, to point to one specific issue, what does 71 refer to.

This is relevant to me currently, as I would like to add, some variables to track, the number of ff and bf scatters in each cell to better understand why some of the Pluto models seem to be exceedingly slow.

(As an aside, also odd from my perspective, if you look at communicate_plasma, is there are places where nscat_es is "communicated", but nscat_nres is not. )

Instructions are needed to explain what one should do to make at least "minor" modifications, like adding an integer or double or an array to the structure, either as part of the doxygen comments in the file, or in an rst file for the sphinx documentation.

Edward-RSE commented 2 days ago

I can add some inline comments to explain the code better, and I'll also update the Doxygen comments for the functions to explain how to add another variable to the communication.

To clarify what 71 is, it's the number of single doubles which are communicated for each cell. This corresponds to the number of MPI_Pack's which looks like this:

MPI_Pack (&cell->vol, 1, MPI_DOUBLE, comm_buffer, comm_buffer_size, &position, MPI_COMM_WORLD);

Then, for example, 11 * nions corresponds to communicating 11 arrays which each have nions number of elements. This is the same approach we used prior to the communication refactor, and I believe it’s the most efficient approach we can take -- but that's not to say we can't improve its clarity or documentation. Using a derived type to avoid manually counting variables to determine the buffer size is not feasible, due to the pointers in the PlasmaPtr structure meaning plamamain is not contiguous in memory.

Regarding your aside about nscat_es andnscat_nres, I think there is scope to remove, or maybe add in some cases, variables from the plasma update. If it's purely a diagnostic property, then I don't think it needs to be communicated between ranks.

kslong commented 1 day ago

That would be great if you would do that @Edward-RSE. I have been successful in adding a couple of new integers to the Plasma structure, in the dense branch. But making it easier to understand how to modify these items the next time I or someone else looks at it would be helpful.

As an aside, I wondered if there might be a way to add a check, maybe a variable at the end that had a defined value to make sure one had not done something grossly wrong.

sirocco-rt / sirocco

communicate_plasma is undocumented #1126