mpiforumbot commented 8 years ago

Originally by traff on 2008-10-10 08:27:53 -0500

Fix Scalability Issues in Graph Topology InterfaceAuthor: Torsten Hoefler, Jesper Larsson Traeff with comments from Rolf

Rabenseifner, Hubert Ritzdorf, and Rajeev Thakur

Status

Sep 8 2008 Straw votes

For a scalable description

|| Yes || Maybe || No for 2.2 || Abstain || || 7 || 20 ||0 || 3 ||

For adding info to weight of edges

|| Yes || Maybe || No for 2.2 || Abstain || || 4 || 22 || 1 || 6 ||

October 2008 Straw Vote

|| Yes || Maybe || No for 2.2 || Abstain || || 23 || 0 || 0 || 5 ||

April 09 Straw Votes

Fully flexible version vs. only adjacent version.

|| only flexible || only adjacent || both || || 9 || 2 || 8 ||

Using MPI_Info arguments for any assertions (that change the semantics) is not in the spirit of MPI, thus, we can't use Info for neither asserting that each process specifies the whole graph or all processes specified reorder or not.

Reorder collective or not?

|| collective || not collective || || 5 || 6 ||

It's close, but not collective is also more consistent with the current standard. We can change this in MPI-3 easily (while keeping downward compatibility).

Description

The graph interface in MPI-1 has some serious (and unnecessary) scalability issues that prohibit efficient usage on large-scale machines.

The new scalable interface is essentially a single function, that is complemented by two inquiry functions. The concrete proposal introduces two new constants, and entails a few other changes related to the topology functionality.

See additional proposal/discussion in Ticket #147.

History

This interface was introduced in MPI-1 designed based on some older parallel library (Alexander Supalov can provide details). However, the interface was never changed. Right now, each process has to know the whole topology during graph creation and every process can query the neighbors of every other process in the communicator with a local operation. This makes it necessary to store the whole topology at every process locally which might be a prohibitive waste of memory resources.

The proposal is to introduce new functionality for specifying topologies by distributed graphs, and deprecate the current MPI 2.1 graph topology functionality.

Proposed Solution

Page 15, line 5, add:

MPI_UNWEIGHTED

Page 195, line 27, add:

MPI::Distgraphcomm MPI::Distgraphcomm::Dup() const

Page 195, line 35, add:

MPI::Distgraphcomm MPI::Distgraphcomm::Clone() const

Page 242:15-22 - remove sentence

"Edges in the communication graph are not weighted, so that processes are either simply connected or not connected at all."

and (outdated) rationale.

Page 243, line 3: Replace

"The functions MPI_GRAPH_CREATE and MPI_CART_CREATE are used"

by

"The functions MPI_GRAPH_CREATE, MPI_DIST_GRAPH_CREATE_ADJACENT, MPI_DIST_GRAPH_CREATE and MPI_CART_CREATE are used"

Page 243, line 8, replace:

"All input arguments must have identical values on all processes of the group of comm_old. A new communicator ..."

by:

"For MPI_GRAPH_CREATE and MPI_CART_CREATE, all input arguments must have identical values on all processes of the group of comm_old. For MPI_DIST_GRAPH_CREATE_ADJACENT and MPI_DIST_GRAPH_CREATE the input communication graph is distributed across the calling processes. Therefore the processes provide different values for the arguments specifying the graph. However, all processes must give the same value for reorder and the info argument. In all cases, a new communicator ..."

Page 243, line 33 (before last sentence), add:

For distributed graphs, the functions MPI_DIST_NEIGHBORS_COUNT and MPI_DIST_NEIGHBORS can be used to extract the neighbors of the calling node.

Page 247, after line 48, add new section:

7.5.4 Distributed (Graph) Constructor

The general graph constructor assumes that each process passes the full (global) communication graph to the call. This limits the scalability of this constructor. With the distributed graph interface, the communication graph is specified in a fully distributed fashion. Each process specifies only the part of the communication graph of which it is aware. Typically, this could be the set of processes from which the process will eventually receive or get data, or the set of processes to which the process will send or put data, or some combination of such edges. Two different interfaces can be used to create a distributed graph topology. MPI_DIST_GRAPH_CREATE_ADJACENT creates a distributed graph communicator with each process specifying all of its incoming and outgoing (adjacent) edges in the logical communication graph and thus requires minimal communication during creation. MPI_DIST_GRAPH_CREATE provides full flexibility, and processes can indicate that communication will occur between other pairs of processes.

To provide better possibilities for optimization by the MPI library, the distributed graph constructors permit weighted communication edges and take an info argument that can further influence process reordering or other optimizations performed by the MPI library. For example, hints can be provided on how edge weights are to be interpreted, the quality of the reordering, and/or the time permitted for the MPI library to process the graph.

#!c
MPI_DIST_GRAPH_CREATE_ADJACENT(comm_old, indegree, sources, sourceweights, 
outdegree, destinations, destweights, info, reorder, comm_dist_graph)
 IN      comm_old        input communicator (handle)
 IN      indegree        size of sources and sourceweights arrays 
                         (non-negative integer)
 IN      sources         ranks of processes for which the calling 
                         process is a destination 
                         (array of non-negative integers)
 IN      sourceweights   weights of the edges into the calling 
                         process (array of non-negative integers)
 IN      outdegree       size of destinations and destweights arrays 
                         (non-negative integer)
 IN      destinations    ranks of processes for which the calling 
                         process is a source
                         (array of non-negative integers)
 IN      destweights     weights of the edges out of the calling 
                         process (array of non-negative integers)
 IN      info            hints on optimization and interpretation of 
                         weights (handle)
 IN      reorder         the ranks may be reordered (true) or 
                         not (false) (logical)
 OUT     comm_dist_graph communicator with distributed graph 
                        topology (handle)

int MPI_Dist_graph_create_adjacent(MPI_Comm comm_old, 
          int indegree, int sources[], int sourceweights[],
          int outdegree, int destinations[], int destweights[],
          MPI_Info info, int reorder, MPI_Comm *comm_dist_graph)

MPI_DIST_GRAPH_CREATE_ADJACENT(COMM_OLD, INDEGREE, SOURCES, SOURCEWEIGHTS, 
OUTDEGREE, DESTINATIONS, DESTWEIGHTS, INFO, REORDER, COMM_DIST_GRAPH, 
IERROR)
    INTEGER COMM_OLD, INDEGREE, SOURCES(*), SOURCEWEIGHTS(*), 
    OUTDEGREE, DESTINATIONS(*), DESTWEIGHTS(*), INFO, 
    COMM_DIST_GRAPH, IERROR
    LOGICAL REORDER

MPI::Distgraphcomm MPI::Intracomm::Dist_graph_create_adjacent(int indegree, 
                   const int sources[], const int sourceweights[],
                   int outdegree, const int destinations[],
                   const int destweights[],
                   const MPI::Info& info, bool reorder) const

MPI::Distgraphcomm MPI::Intracomm::Dist_graph_create_adjacent(int indegree, 
                   const int sources[], int outdegree, 
                   const int destinations[], const MPI::Info& info, 
                   bool reorder) const

MPI_DIST_GRAPH_CREATE_ADJACENT returns a handle to a new communicator to which the distributed graph topology information is attached.
Each process passes all information about the edges to its neighbors in the virtual distributed graph topology. The calling processes must ensure that each edge of the graph is described in the source and in the destination process with the same weights. If there are multiple edges for a given (source,dest) pair, then the sequence of the weights of these edges does not matter. The complete communication topology is the combination of all edges shown in the sources arrays of all processes in comm_old, which must be identical to the combination of all edges shown in the destinations arrays. Source and destination ranks must be process ranks of comm_old. This allows a fully distributed specification of the communication graph. Isolated processes (i.e., processes with no outgoing or incoming edges, that is, processes that have specified indegree and outdegree as zero and that thus do not occur as source or destination rank in the graph specification) are allowed.

The call creates a new communicator comm_dist_graph of distributed graph topology type to which topology information has been attached. The number of processes in comm_dist_graph is identical to the number of processes in comm_old. The call to MPI_DIST_GRAPH_CREATE_ADJACENT is collective.

Weights are specified as non-negative integers and can be used to influence the process remapping strategy and other internal MPI optimizations. For instance, approximate count arguments of later communication calls along specific edges could be used as their edge weights. Multiplicity of edges can likewise indicate more intense communication between pairs of processes. However, the exact meaning of edge weights is not specified by the MPI standard and is left to the implementation. In C or Fortran, an application can supply the special value MPI_UNWEIGHTED for both weight arrays to indicate that all edges have the same (effectively no) weight. In C++, this constant does not exist and the weight arguments may be omitted from the argument list. It is erroneous to supply MPI_UNWEIGHTED, or in C++ omit the weight arrays, for some but not all processes' sourceweights and destweights arrays in comm_old. Note that MPI_UNWEIGHTED is not a special weight value; rather it is a special value for the total array argument. In C, one would expect it to be a NULL. In Fortran, MPI_UNWEIGHTED is an object like MPI_BOTTOM (not usable for initialization or assignment). See Section 2.5.4.

The meaning of the info and reorder arguments is defined in the description of the following routine.

#!c
MPI_DIST_GRAPH_CREATE(comm_old, n, sources, degrees, 
destinations, weights, info, reorder, comm_dist_graph)
 IN  comm_old        input communicator (handle)
 IN  n               number of source nodes for which this process
                     specifies edges (non-negative integer)
 IN  sources         array containing the n source nodes for which
                     this process specifies edges 
                     (array of non-negative integers)
 IN  degrees         array specifying the number of destinations for
                     each source node in the source node array 
                     (array of non-negative integers)
 IN  destinations    destination nodes for the source nodes in the 
                     source node array (array of non-negative 
                     integers)
 IN  weights         weights for source to destination edges 
                     (array of non-negative integers)
 IN  info            hints on optimization and interpretation of 
                     weights (handle)
 IN  reorder         the process may be reordered (true) or 
                     not (false) (logical)
 OUT comm_dist_graph communicator with distributed graph topology 
                   added (handle)

int MPI_Dist_graph_create(MPI_Comm comm_old, int n, 
          int sources[], int degrees[], int destinations[],
          int weights[], MPI_Info info, int reorder, 
          MPI_Comm *comm_dist_graph)

MPI_DIST_GRAPH_CREATE(COMM_OLD, N, SOURCES, DEGREES, 
DESTINATIONS, WEIGHTS, INFO, REORDER, COMM_DIST_GRAPH, IERROR)
    INTEGER COMM_OLD, N, SOURCES(*), DEGREES(*), DESTINATIONS(*), 
    WEIGHTS(*), INFO, COMM_DIST_GRAPH, IERROR
    LOGICAL REORDER

MPI::Distgraphcomm MPI::Intracomm::Dist_graph_create(int n, 
                   const int sources[], const int degrees[],
                   const int destinations[], const int weights[], 
                   const MPI::Info& info, bool reorder) const

MPI::Distgraphcomm MPI::Intracomm::Dist_graph_create(int n, 
                   const int sources[], const int degrees[],
                   const int destinations[], const MPI::Info& info, 
                   bool reorder) const

MPI_DIST_GRAPH_CREATE returns a handle to a new communicator to which the distributed graph topology information is attached. Concretely, each process calls the constructor with a set of directed (source,destination) communication edges as described below. Every process passes an array of n source nodes in the sources array. For each source node, a non-negative number of destination nodes is specified in the degrees array. The destination nodes are stored in the corresponding consecutive segment of the destinations array. More precisely if the i-th node in sources is s, this specifies degrees[i] edges (s,d) with d of the j-th such edge stored in destinations[degrees[0]+...+degrees[i-1]+j]. The weight of this edge is stored in weights[degrees[0]+...+degrees[i-1]+j]. Both the sources and the destinations arrays may contain the same node more than once, and the order in which nodes are listed as destinations or sources is not significant. Similarly, different processes may specify edges with the same source and destination nodes. Source and destination nodes must be process ranks of comm_old. Different processes may specify different numbers of source and destination nodes, as well as different source to destination edges. This allows a fully distributed specification of the communication graph. Isolated processes (i.e., processes with no outgoing or incoming edges, that is, processes that do not occur as source or destination node in the graph specification) are allowed.

The call creates a new communicator comm_dist_graph of distributed graph topology type to which topology information has been attached. The number of processes in comm_dist_graph is identical to the number of processes in comm_old. The call to MPI_Dist_graph_create is collective.

If reorder = false, all processes will have the same rank in comm_dist_graph as in comm_old. If reorder = true then the MPI library is free to remap to other processes (of comm_old) in order to improve communication on the edges of the communication graph. The weight associated with each edge is a hint to the MPI library about the amount or intensity of communication on that edge, and may be used to compute a "best" reordering.

Weights are specified as non-negative integers and can be used to influence the process remapping strategy and other internal MPI optimizations. For instance, approximate count arguments of later communication calls along specific edges could be used as their edge weights. Multiplicity of edges can likewise indicate more intense communication between pairs of processes. However, the exact meaning of edge weights is not specified by the MPI standard and is left to the implementation. In C or Fortran, an application can supply the special value MPI_UNWEIGHTED for the weight array to indicate that all edges have the same (effectively no) weight. In C++, this constant does not exist and the weights argument may be omitted from the argument list. It is erroneous to supply MPI_UNWEIGHTED, or in C++ omit the weight arrays, for some but not all processes of comm_old. Note that MPI_UNWEIGHTED is not a special weight value; rather it is a special value for the total array argument. In C, one would expect it to be a NULL. In Fortran, MPI_UNWEIGHTED is an object like MPI_BOTTOM (not usable for initialization or assignment). See Section 2.5.4.

The meaning of the weights argument can be influenced by the info argument. Info arguments can be used to guide the mapping; possible options include minimizing the maximum number of edges between processes on different SMP nodes, or minimizing the sum of all such edges. An MPI implementation is not obliged to follow specific hints, and it is legal for an MPI implementation not to do any reordering. An MPI implementation may specify more info key-value pairs. All processes must specify the same set of key-value info pairs.

-Advice to implementors.* MPI implementations must document any additionally supported key-value info pairs. MPI_INFO_NULL is always valid, and may indicate the default creation of the distributed graph topology to the MPI library.

An implementation does not explicitly need to construct the topology from its distributed parts. However, all processes can construct the full topology from the distributed specification and use this in a call to MPI_GRAPH_CREATE to create the topology. This may serve as a reference implementation of the functionality, and may be acceptable for small communicators. However, a scalable high-quality implementation would save the topology graph in a distributed way. (End of advice to implementors.)

Example 7.3
As for Example 7.2., assume there are four processes 0, 1, 2, 3 with the following adjacency matrix and unit edge weights:

|| process || neighbors || || 0 || 1, 3 || || 1 || 0 || || 2 || 3 || || 3 || 0, 2 ||

With MPI_DIST_GRAPH_CREATE, this graph could be constructed in many different ways. One way would be that each process specifies its outgoing edges. The arguments per process would be:

|| process || n || sources || degrees || destinations || weights || || 0 || 1 || 0 || 2 || 1,3 || 1,1 || || 1 || 1 || 1 || 1 || 0 || 1 || || 2 || 1 || 2 || 1 || 3 || 1 || || 3 || 1 || 3 || 2 || 0,2 || 1,1 ||

Another way would be to pass the whole graph on process 0, which could be done with the following arguments per process: || process || n || sources || degrees || destinations || weights || || 0 || 4 || 0,1,2,3 || 2,1,1,2 || 1,3,0,3,0,2 || 1,1,1,1,1,1 || || 1 || 0 || - || - || - || - || || 2 || 0 || - || - || - || - || || 3 || 0 || - || - || - || ||

In both cases above, the application could supply MPI_UNWEIGHTED instead of explicitly providing identical weights.

MPI_DIST_GRAPH_CREATE_ADJACENT could be used to specify this graph using the following arguments: || process || indegree || sources || sourceweights || outdegree || destinations || destweights || || 0 || 2 || 1,3 || 1,1 || 2 || 1,3 || 1,1 || || 1 || 1 || 0 || 1 || 1 || 0 || 1 || || 2 || 1 || 3 || 1 || 1 || 3 || 1 || || 3 || 2 || 0,2 || 1,1 || 2 || 0,2 || 1,1 ||

Example 7.4
A two-dimensional PxQ torus where all processes communicate along the dimensions and along the diagonal edges. This cannot be modelled with Cartesian topologies, but can easily be captured with MPI_DIST_GRAPH_CREATE as shown in the following code. In this example, the communication along the dimensions is twice as heavy as the communication along the diagonals:

#!c
/*
Input:     dimensions P, Q
Condition: number of processes equal to P*Q; otherwise only 
           ranks smaller than P*Q participate
*/
int rank, x, y;
int sources[1], degrees[1];
int destinations[8], weights[8];

MPI_Comm_rank(MPI_COMM_WORLD, &rank);

/* get x and y dimension */
y=rank/P; x=rank%P;

/* get my communication partners along x dimension */
destinations[0] # P*y+(x+1)%P; weights[0]2;
destinations[1] # P*y+(P+x-1)%P; weights[1]2;

/* get my communication partners along y dimension */
destinations[2] # P*((y+1)%Q)+x; weights[2]2;
destinations[3] # P*((Q+y-1)%Q)+x; weights[3]2;

/* get my communication partners along diagonals */
destinations[4] # P*((y+1)%Q)+(x+1)%P; weights[4]1;
destinations[5] # P*((Q+y-1)%Q)+(x+1)%P; weights[5]1;
destinations[6] # P*((y+1)%Q)+(P+x-1)%P; weights[6]1;
destinations[7] # P*((Q+y-1)%Q)+(P+x-1)%P; weights[7]1;

sources[0] = rank;
degrees[0] = 8;
MPI_Dist_graph_create(MPI_COMM_WORLD, 1, sources, degrees, destinations,
  weights, MPI_INFO_NULL, 1, comm_dist_graph)

Page 248, line 21, add (i.e., after MPI_GRAPH):

MPI_DIST_GRAPH distributed graph topology

Page 252, line 6, add:

MPI_DIST_GRAPH_NEIGHBORS_COUNT and MPI_DIST_GRAPH_NEIGHBORS provide adjacency information for a distributed graph topology.

#!c
MPI_DIST_GRAPH_NEIGHBORS_COUNT(comm, indegree, outdegree, weighted)
 IN      comm          communicator with distributed graph topology (handle)
 OUT     indegree      number of edges into this process (non-negative 
                       integer)
 OUT     outdegree     number of edges out of this process (non-negative 
                       integer)
 OUT     weighted      false if MPI_UNWEIGHTED was supplied during creation, 
                       true otherwise (logical)

int MPI_Dist_graph_neighbors_count(MPI_Comm comm, int *indegree, 
    int *outdegree, int *weighted)

MPI_DIST_GRAPH_NEIGHBORS_COUNT(COMM, INDEGREE, OUTDEGREE, WEIGHTED, IERROR)
  INTEGER COMM, INDEGREE, OUTDEGREE, IERROR
  LOGICAL WEIGHTED

void MPI::Distgraphcomm::Get_dist_neighbors_count(int rank, int indegree[], 
    int outdegree[], bool& weighted) const

#!c
MPI_DIST_GRAPH_NEIGHBORS(comm, maxindegree, sources, sourceweights, 
                         maxoutdegree, destinations, destweights)
 IN      comm           communicator with distributed graph topology (handle)
 IN      maxindegree    size of sources and sourceweights arrays 
                        (non-negative integer)
 OUT     sources        processes for which the calling process is a 
                        destination (array of non-negative integers)
 OUT     sourceweights  weights of the edges into the calling process 
                        (array of non-negative integers)
 IN      maxoutdegree   size of destinations and destweights arrays 
                        (non-negative integer)
 OUT     destinations   processes for which the calling process is a source
                        (array of non-negative integers)
 OUT     destweights    weights of the edges out of the calling process 
                        (array of non-negative integers)

int MPI_Dist_graph_neighbors(MPI_Comm comm, int maxindegree, int sources[], 
                             int sourceweights[], int maxoutdegree, 
                             int destinations[], int destweights[])

MPI_DIST_GRAPH_NEIGHBORS(COMM, MAXINDEGREE, SOURCES, SOURCEWEIGHTS, 
                         MAXOUTDEGREE, DESTINATIONS, DESTWEIGHTS, IERROR)
  INTEGER COMM, MAXINDEGREE, SOURCES(*), SOURCEWEIGHTS(*), MAXOUTDEGREE, 
                         OUTDEGREE, DESTINATIONS(*), DESTWEIGHTS(*), IERROR

void MPI::Distgraphcomm::Get_dist_neighbors(int maxindegree, int sources[],
                                        int sourceweights[], 
                                        int maxoutdegree, 
                                        int destinations[],
                                        int destweights[])

These calls are local. The number of edges into and out of the process returned by MPI_DIST_GRAPH_NEIGHBORS_COUNT are the total number of such edges given in the call to MPI_DIST_GRAPH_CREATE_ADJACENT or MPI_DIST_GRAPH_CREATE (potentially by processes other than the calling process in the case of MPI_DIST_GRAPH_CREATE). Multiply defined edges are all counted and returned by MPI_DIST_GRAPH_NEIGHBORS in some order. If MPI_UNWEIGHTED is supplied for sourceweights or destweights or both, or if MPI_UNWEIGHTED was supplied during the construction of the graph then no weight information is returned in that array or those arrays. The only requirement on the order of values in sources and destinations is that two calls to the routine with same input argument comm will return the same sequence of edges. If maxindegree or maxoutdegree is smaller than the numbers returned by MPI_DIST_GRAPH_NEIGHBOR_COUNT, then only the first part of the full list is returned. Note, that the order of returned edges does need not to be identical to the order that was provided in the creation of comm for the case that MPI_DIST_GRAPH_CREATE_ADJACENT was used.

-Advice to implementors:* Since the query calls are defined to be local, each process needs to store the list of its neighbors with incoming and outgoing edges. Communication is required at the collective MPI_DIST_GRAPH_CREATE call in order to compute the neighbor lists for each process from the distributed graph specification. (End of advice to implementors.)

Page 450, line 13, add:

class Distgraphcomm : public Intracomm {...};

Page 456, line 1:

replace: "five" with "six"

Page 456, line 5:

replace "MPI::Cartcomm and MPI::Graphcomm are derived from MPI::Intracomm."

with

MPI::Cartcomm, MPI::Graphcomm, and MPI::Distgraphcomm are derived from MPI::Intracomm.

Page 457, line 11, add:

Distgraphcomm& Distgraphcomm::Clone() const

Page 457, line 34, replace:

// Cartcomm and Graphcomm are similarly defined

with:

// Cartcomm, Graphcomm, and Distgraphcomm are similarly defined

Page 462, line 36, replace: "MPI_STATUS_IGNORE, MPI_STATUSES_IGNORE, MPI_ERRCODES_IGNORE,"

with

"MPI_STATUS_IGNORE, MPI_STATUSES_IGNORE, MPI_ERRCODES_IGNORE, MPI_UNWEIGHTED,"

Page 465, line 14, 15: addressed in #104

by:

// Cartcomm, Graphcomm and Distgraphcomm are similarly defined

Page 497 (Appendix A), line 9, add:

MPI_DIST_GRAPH MPI::DIST_GRAPH

Page 499, after line 12:

add "MPI_UNWEIGHTED Not defined in C++"

Page 500, line 6, add:

MPI::Distgraphcomm

Page 550, line 27, add:

Distgraphcomm& Distgraphcomm::Clone() const

Page 551, line 13, add:

Distgraphcomm& Distgraphcomm::Dup() const

Functions to Deprecate (not included in this ticket but kept for reference!)

The following functions may be deprecated by this proposal (for MPI 2.2 or MPI 3.0). The proposal is not dependent on whether this happens or not, and the discussion on this should be kept separate.

#!c
MPI_GRAPH_CREATE(comm_old, nnodes, index, edges, reorder, comm_dist_graph)
 IN  comm_old        input communicator (handle)
 IN  nnodes          number of nodes in graph (integer)
 IN  index           array of integers describing node degrees (see below)
 IN  edges           array of integers describing graph edges (see below)
 IN  reorder         ranking may be reordered (true) or not (false) (logical)
 OUT comm_dist_graph communicator with graph topology added (handle)

#!c
MPI_GRAPH_MAP(comm, nnodes, index, edges, newrank)
 IN      comm                      input communicator (handle)
 IN      nnodes                    number of graph nodes (integer)
 IN      index                     integer array specifying the graph structure, see MPI_GRAPH_CREATE
 IN      edges                     integer array specifying the graph structure
 OUT     newrank                   reordered rank of the calling process;
                                   MPI_UNDEFINED if the calling process does not belong to graph (integer)

#!c
MPI_GRAPH_NEIGHBORS(comm, rank, maxneighbors, neighbors)
 IN      comm                    communicator with graph topology (handle)
 IN      rank                    rank of process in group of comm (integer)
 IN      maxneighbors            size of array neighbors (integer)
 OUT     neighbors               ranks of processes that are neighbors to specified process (array of integer)

#!c
MPI_GRAPH_NEIGHBORS_COUNT(comm, rank, nneighbors)
 IN      comm                 communicator with graph topology (handle)
 IN      rank                 rank of process in group of comm (integer)
 OUT     nneighbors           number of neighbors of specified process (integer)

#!c
MPI_GRAPH_GET(comm, maxindex, maxedges, index, edges)
 IN      comm                     communicator with graph structure (handle)
 IN      maxindex                 length of vector index in the calling program
                                  (integer)
 IN      maxedges                 length of vector edges in the calling program
                                  (integer)
 OUT     index                    array of integers containing the graph structure (for
                                  details see the definition of MPI_GRAPH_CREATE)
 OUT     edges                    array of integers containing the graph structure

#!c
MPI_GRAPHDIMS_GET(comm, nnodes, nedges)
 IN      comm                    communicator for group with graph structure (handle)
 OUT     nnodes                  number of nodes in graph (integer) (same as number
                                 of processes in the group)
 OUT     nedges                  number of edges in graph (integer)

Since MPI_DIST_GRAPH_CREATE is collective, it would be a small matter to compute the total number of edges. Thus, MPI_GRAPHDIMS_GET could be kept. Note that nnodes is equal to the number of processes in comm.

Possible additional functionality (not included in this ticket but kept for future reference)

To be discussed:

#!c
MPI_DIST_EDGE_COUNT(comm, edges)
 IN comm              communicator with distributed graph topology (handle)
 OUT edges (integer)  total number of edges in distributed graph

Impact on Implementations

Implementations have to add the new functions, which can easily be implemented on top of the existing, general graph interface. This trivial implementation requires little work.

Impact on Applications / Users

Users who want to use the new functionality will have to change calls to graph creation and neighbor query functions. They have to introduce explicit communication of neighbor lists if they query other processes neighbors (this is unlikely).

Alternative Solutions

For more discussion/connection to topological collectives, see [wiki:TopoColl]

Keep for later (maybe introduction in MPI-3): Advice to users. Specifying reorder # false on only some processes can support heterogeneous application to hardware mapping. For example, suppose an application does visualization or heavy I/O on rank 0, and that only this processor has the corresponding visualization or I/O capabilities. If this rank should be part of a distributed graph topology, it is useful to specify that this rank will not be remapped. Allowing remapping of other ranks can then exploit features of the interconnection network topology. Another scenario is an application where some ranks have large state information that would be (too) expensive to migrate. Setting reorder``false for these ranks guarantees that migration will not be necessary since these processes will preserve their rank in the distributed graph communicator. (End of advice to users.)

-Advice to implementors.* The reordering optimization problem must account for the constraint that processes with reorder = false must remain fixed. This could be done by eliminating the fixed nodes and all their edges, and solving the optimization problem on the reduced graph. However, better results may be possible by considering the entire graph. (End of advice to implementors.)

Entry for the Change Log

Section 7.5.3 on page 247. New functions for a scalable distributed graph topology interface has been added. In this section, the functions MPI_DIST_GRAPH_CREATE_ADJACENT and MPI_DIST_GRAPH_CREATE, the constants MPI_UNWEIGHTED, and the derived C++ class Distgraphcomm were added.

Section 7.5.4, page 252. For the scalable distributed graph topology interface, the functions MPI_DIST_NEIGHBORS_COUNT and MPI_DIST_NEIGHBORS and the constant MPI_DIST_GRAPH were added.

mpiforumbot commented 8 years ago

Originally by bronis on 2009-04-03 06:16:21 -0500

Reviews OK, again.

mpiforumbot commented 8 years ago

Originally by smithbr on 2009-04-03 08:19:26 -0500

I didn't see anything else beyond that typo that hasn't already been addressed.

mpiforumbot commented 8 years ago

Originally by RolfRabenseifner on 2009-04-03 11:19:26 -0500

Thank you for ticket #147 .

Here a few minor bugs:

"IN destinations target nodes..."[[BR]] should read[[BR]] "IN destinations destination nodes..."
INTEGER COMM_OLD, N, SOURCES(*), DEGREES(*), TARGETS(*) WEIGHTS(*) INFO, COMM_GRAPH, IERROR[[BR]] should read (TARGETS->DESTINATIONS, and commas added)[[BR]] INTEGER COMM_OLD, N, SOURCES(*), DEGREES(*), DESTINATIONS(*), WEIGHTS(*), INFO, COMM_GRAPH, IERROR
The sentence

"Isolated processes (i.e., processes with no outgoing or incoming edges, that is, processes that do not occur as source or destination node in the graph specification) are allowed."

should be moved from its current location (telling something about the new new communicator) to the end of the previous paragraph, i.e., after

"This allows a fully distributed specification of the communication graph."

Reason: With the current location of the sentence, it looks like that some nodes may be split off, i.e., having a comm-self communicator, which does not fit to the sentence before about the size of comm_graph.
Please write (otherwise the position is not absolutely clear):

Page 248, line 21, add (i.e., after MPI_GRAPH):
On the "OUT weighted ..." line, please add at the end

(logical)

mpiforumbot commented 8 years ago

Originally by RolfRabenseifner on 2009-04-03 11:31:41 -0500

The following item is still open:

Current text reads:

It is erroneous to supply MPI_UNWEIGHTED in one process and not in all other processes of comm_old.

but should read:

It is erroneous to supply MPI_UNWEIGHTED in one process and not in all other processes of comm_old. Note that MPI_UNWEIGHTED is not a special weight value; rather it is a special value for the total array argument. In C, one would expect it to be a NULL. In Fortran, MPI_UNWEIGHTED is an object like MPI_BOTTOM (not usable for initilaization or assignment). See Section 2.5.4.

Normally this type of text is added for all these special constants. You replied "fixed", but you lost the additional sentences.

mpiforumbot commented 8 years ago

Originally by htor on 2009-04-03 12:11:51 -0500

Replying to RolfRabenseifner:

Thank you for ticket #147 . please feel free to edit it!

Here a few minor bugs:

"IN destinations target nodes..."[[BR]] should read[[BR]] "IN destinations destination nodes..." fixed - thanks

INTEGER COMM_OLD, N, SOURCES(*), DEGREES(*), TARGETS(*) WEIGHTS(*) INFO, COMM_GRAPH, IERROR[[BR]] should read (TARGETS->DESTINATIONS, and commas added)[[BR]] INTEGER COMM_OLD, N, SOURCES(*), DEGREES(*), DESTINATIONS(*), WEIGHTS(*), INFO, COMM_GRAPH, IERROR fixed

The sentence

"Isolated processes (i.e., processes with no outgoing or incoming edges, that is, processes that do not occur as source or destination node in the graph specification) are allowed."

should be moved from its current location (telling something about the new new communicator) to the end of the previous paragraph, i.e., after

"This allows a fully distributed specification of the communication graph."

Reason: With the current location of the sentence, it looks like that some nodes may be split off, i.e., having a comm-self communicator, which does not fit to the sentence before about the size of comm_graph. yes, that's true - fixed

Please write (otherwise the position is not absolutely clear):

Page 248, line 21, add (i.e., after MPI_GRAPH): fixed

On the "OUT weighted ..." line, please add at the end

(logical) fixed

To your other comment: I re-added the text as it is consistent with the MPI-2.1 standard.

Thanks, Torsten

mpiforumbot commented 8 years ago

Originally by RolfRabenseifner on 2009-04-03 12:28:42 -0500

I fixed a small bug: MPI_DIST_GRAPH_NEIGHBORS(..., MAXOUTDEGREE, OUTDEGREE, should read MPI_DIST_GRAPH_NEIGHBORS(..., MAXOUTDEGREE,

Technical review: All looks fine now.

My opinion: Additional interface proposal in ticket #147 should be also available. (I'm currently editing it).

mpiforumbot commented 8 years ago

Originally by RolfRabenseifner on 2009-04-03 12:34:16 -0500

Current text: void MPI::Distgraphcomm::Get_dist_neighbors(int rank, int maxindegree, should read void MPI::Distgraphcomm::Get_dist_neighbors(int maxindegree,

I modified this.

mpiforumbot commented 8 years ago

Originally by bronis on 2009-04-03 12:38:26 -0500

I made a very small change -- it had read:

a non-negative number of destination nodes are specified

but the verb should agree with "number" so it should read:

a non-negative number of destination nodes is specified

so I changed "are" to "is"

mpiforumbot commented 8 years ago

Originally by htor on 2009-04-03 16:20:18 -0500

fixed minor spelling mistakes.

mpiforumbot commented 8 years ago

Originally by bronis on 2009-04-03 16:33:11 -0500

Are we there yet? I hope so. These changes review fine.

mpiforumbot commented 8 years ago

Originally by gropp on 2009-04-05 16:38:07 -0500

Reviewed and fine.

mpiforumbot commented 8 years ago

Originally by rsthakur on 2009-04-05 19:24:52 -0500

IN  sources       array containing the n source nodes for which this     
                   process contributes edges (array of non-negative integers)

The calling process doesn't "contribute" edges to the source nodes. The calling process may not even be connected to any of the source nodes. So "specifies edges" may be a better wording than "contributes edges".

IN  reorder       the process may be reordered (true) or not (false) (logical)

Instead of "the process may be reordered" say "the calling process's rank may be reordered"

Allowing some processes to say reorder and some to say don't reorder makes it complicated for the implementation. The implementation may need to do an all-to-all to figure out what's going on. (Therefore, the current simple example implementation on top of existing graph topology functions doesn't illustrate performance benefits or pitfalls.)

Advice to implementors. The reordering optimization problem must account for the constraint that processes with reorder = false must remain fixed. An easy way to do this is to eliminate the fixed nodes and all their edges, and solve the optimization problem on the reduced graph.

I don't think it is easy. To know which nodes have passed reorder=false you have to communicate. And the edges for those nodes may not be specified by those nodes. They may be specified by a completely different set of nodes.

These calls are local. The number of edges into and out of the process returned by MPI_Dist_graph_neighbors_count are the total number of such edges given in the call to MPI_Dist_graph_create.

I would add at the end: "(potentially by processes other than the calling process)."

It would be good to specify whether or not the old graph functions (MPI_Graph_get and others listed under deprecate above) can still be used on a graph created with MPI_Dist_graph_create. One option is to say they cannot be used. Because to implement MPI_Graph_get, the implementation would have to collect the entire graph onto each process in MPI_Dist_graph_create itself, defeating the purpose.

mpiforumbot commented 8 years ago

Originally by htor on 2009-04-05 19:42:13 -0500

I would say our review technique has serious convergence issues :).

Replying to rsthakur:

IN  sources       array containing the n source nodes for which this     
                   process contributes edges (array of non-negative integers)
The calling process doesn't "contribute" edges to the source nodes. The calling process may not even be connected to any of the source nodes. So "specifies edges" may be a better wording than "contributes edges". I agree - fixed
IN  reorder       the process may be reordered (true) or not (false) (logical)
Instead of "the process may be reordered" say "the calling process's rank may be reordered"

Allowing some processes to say reorder and some to say don't reorder makes it complicated for the implementation. The implementation may need to do an all-to-all to figure out what's going on. yes, I'm in favor of specifying an Info argument like "all_reorder" or such for this case. I added it to my slides for tomorrow!

(Therefore, the current simple example implementation on top of existing graph topology functions doesn't illustrate performance benefits or pitfalls.) Ha! That would lift the barrier for a reference implementation to a whole new level!

Advice to implementors. The reordering optimization problem must account for the constraint that processes with reorder = false must remain fixed. An easy way to do this is to eliminate the fixed nodes and all their edges, and solve the optimization problem on the reduced graph.

I don't think it is easy. To know which nodes have passed reorder=false you have to communicate. And the edges for those nodes may not be specified by those nodes. They may be specified by a completely different set of nodes. yes, distributed graphs are not simple.

These calls are local. The number of edges into and out of the process returned by MPI_Dist_graph_neighbors_count are the total number of such edges given in the call to MPI_Dist_graph_create.

I would add at the end: "(potentially by processes other than the calling process)." added

It would be good to specify whether or not the old graph functions (MPI_Graph_get and others listed under deprecate above) can still be used on a graph created with MPI_Dist_graph_create. One option is to say they cannot be used. Because to implement MPI_Graph_get, the implementation would have to collect the entire graph onto each process in MPI_Dist_graph_create itself, defeating the purpose. they are not supported (i.e., MPI-2.2 will only allow objects of type MPI_GRAPH in them, everything else will throw the usual topology error). Do we need to specify this explicitly?

Thanks, Torsten

mpiforumbot commented 8 years ago

Originally by rsthakur on 2009-04-05 19:55:50 -0500

Replying to htor:

Advice to implementors. The reordering optimization problem must account for the constraint that processes with reorder = false must remain fixed. An easy way to do this is to eliminate the fixed nodes and all their edges, and solve the optimization problem on the reduced graph.

I don't think it is easy. To know which nodes have passed reorder=false you have to communicate. And the edges for those nodes may not be specified by those nodes. They may be specified by a completely different set of nodes. yes, distributed graphs are not simple.

Therefore, the text shouldn't say "An easy way to do this is" :-).

It would be good to specify whether or not the old graph functions (MPI_Graph_get and others listed under deprecate above) can still be used on a graph created with MPI_Dist_graph_create. One option is to say they cannot be used. Because to implement MPI_Graph_get, the implementation would have to collect the entire graph onto each process in MPI_Dist_graph_create itself, defeating the purpose. they are not supported (i.e., MPI-2.2 will only allow objects of type MPI_GRAPH in them, everything else will throw the usual topology error). Do we need to specify this explicitly?

Rolf's comment above (first bullet in https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/33#comment:46) indicates he expects those functions to be supported. Hence I asked.

mpiforumbot commented 8 years ago

Originally by htor on 2009-04-05 20:01:30 -0500

Replying to rsthakur:

Replying to htor:

Advice to implementors. The reordering optimization problem must account for the constraint that processes with reorder = false must remain fixed. An easy way to do this is to eliminate the fixed nodes and all their edges, and solve the optimization problem on the reduced graph.

I don't think it is easy. To know which nodes have passed reorder=false you have to communicate. And the edges for those nodes may not be specified by those nodes. They may be specified by a completely different set of nodes. yes, distributed graphs are not simple.

Therefore, the text shouldn't say "An easy way to do this is" :-). yeah, what we meant with this is that the two partitions are easy to figure out (on alltoall or allreduce with P elements). But you're right - the wording is suboptimal (we should not rate any method in the standard) - I changed it.

It would be good to specify whether or not the old graph functions (MPI_Graph_get and others listed under deprecate above) can still be used on a graph created with MPI_Dist_graph_create. One option is to say they cannot be used. Because to implement MPI_Graph_get, the implementation would have to collect the entire graph onto each process in MPI_Dist_graph_create itself, defeating the purpose. they are not supported (i.e., MPI-2.2 will only allow objects of type MPI_GRAPH in them, everything else will throw the usual topology error). Do we need to specify this explicitly?

Rolf's comment above (first bullet in https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/33#comment:46) indicates he expects those functions to be supported. Hence I asked. hmm, ok - we will talk about this tomorrow (unless you have a specific proposal where to add which text :).

Thanks, Torsten

mpiforumbot commented 8 years ago

Originally by rsthakur on 2009-04-05 20:08:26 -0500

I am ok with saying the old graph functions cannot be used with MPI_DIST_GRAPH topologies. But it can be added after tomorrow's discussion.

mpiforumbot commented 8 years ago

Originally by bronis on 2009-04-05 20:36:33 -0500

Recent changes reviewed OK.

mpiforumbot commented 8 years ago

Originally by rsthakur on 2009-04-05 22:55:51 -0500

There is a major problem with this proposal. Solution is provided later on.

The new interface causes already communication in MPI_DIST_GRAPH_CREATE to fulfil the non-collective functionality of MPI_DIST_GRAPH_NEIGHBORS_COUNT and MPI_DIST_GRAPH_NEIGHBORS. yes, at least a single allreduce is needed

Isn't it more like an all-to-all? Each process needs to talk to every other process to find out whom it is connected to. If you do an allreduce, it would be over a very large vector of size equal to the total number of edges in the graph.

mpiforumbot commented 8 years ago

Originally by traff on 2009-04-06 01:54:27 -0500

Replying to RolfRabenseifner:

The following item is still open:

Current text reads:

It is erroneous to supply MPI_UNWEIGHTED in one process and not in all other processes of comm_old.

but should read:

It is erroneous to supply MPI_UNWEIGHTED in one process and not in all other processes of comm_old. Note that MPI_UNWEIGHTED is not a special weight value; rather it is a special value for the total array argument. In C, one would expect it to be a NULL. In Fortran, MPI_UNWEIGHTED is an object like MPI_BOTTOM (not usable for initilaization or assignment). See Section 2.5.4.

Normally this type of text is added for all these special constants. You replied "fixed", but you lost the additional sentences.

Rolf, I replied that I found the rest superfluous - and I still do: I don't think this is the place to say how MPI_UNWEIGHTED may be implemented. It's already said that MPI_UNWEIGHTED is a special value and in my opinion this suffices

Jesper

mpiforumbot commented 8 years ago

Originally by traff on 2009-04-06 02:38:10 -0500

Replying to rsthakur:

I am ok with saying the old graph functions cannot be used with MPI_DIST_GRAPH topologies. But it can be added after tomorrow's discussion.

It could be added, but is not strictly necessary: For these functions it is alreay in the prototype said that the input communicator is "with graph topology". But maybe the word "general" should be dropped on Page 251, line 35. (or add the sentence here: "These functions are not applicable to distributed graph topologies.")

Other than that (and that I'd like to drop the overspecification of MPI_UNWEIGHTED), I think I'm fine with the proposal as it is now.

Jesper

mpiforumbot commented 8 years ago

Originally by chsi on 2009-04-06 07:28:50 -0500

Hi all,

I'm sorry that this idea pops up so late. Nevertheless, I'd propose to replace the logical 'reorder' argument (e.g., 'bool reorder' in the C++ binding) with a numerical 'color' argument (e.g., 'int color').

Reason: The {true,false} possibilities of 'reorder' are too limited.

More details:

One "Advice to users" already takes about heterogeneous applications but gives an example with only ONE special rank (rank 0 as I/O). Now imagine you have several nodes with this "virtualization capabilities" or even further special nodes, but don't want to restrict the ordering within these subgroups. The 'color' argument would be almost identical to the argument in MPI_Comm_split(), i.e., you partition the communicator into disjoint subgroups, one for each value of color. So each subgroup contains all those processes that supplied the same color value. Reordering is now only allowed within each subgroup. This approach is much more flexible (e.g., all virtualization nodes provide the same color in the example) but can easily "emulate" the old semantics (i.e., the 'bool reorder') as follows: All participants with 'reorder=false' supply their own rank number as 'color' argument and all other processes (i.e., with 'reorder=true' provide the size of the communicator as 'color' value.

Christian

mpiforumbot commented 8 years ago

Originally by htor on 2009-04-06 18:08:05 -0500

This is a merger of #147 and #33 as requested by Rolf, please don't review yet!

mpiforumbot commented 8 years ago

Originally by htor on 2009-04-06 18:33:21 -0500

First version of the merge with both changes after the April'09 straw-vote. Up for review (use the diff feature to look at the differences to #147 which has been merged without changes in the previous version of this ticket)!

mpiforumbot commented 8 years ago

Originally by bronis on 2009-04-06 18:43:11 -0500

Looks good.

mpiforumbot commented 8 years ago

Originally by htor on 2009-04-06 18:53:36 -0500

(minor modifications - added function interfaces where necessary)

mpiforumbot commented 8 years ago

Originally by htor on 2009-04-06 19:25:15 -0500

remove duplicated function definitions.

mpiforumbot commented 8 years ago

Originally by rsthakur on 2009-04-06 21:57:39 -0500

In the 2nd para of defn of the adjacent version, it should be "The call to MPI_DIST_GRAPH_CREATE_ADJACENT is collective." (instead of MPI_DIST_GRAPH_CREATE)

The whole paragraph beginning with "Weights are specified as non-negative integers" is repeated in the two routines, whereas for reorder and info, it says see the next routine. You could keep even the weights description in one place and add it to the see the next routine list.

process "contributes" edges has crept back in in the defn of MPI_Dist_graph_create in both places. you could change it to "specifies" edges.

Since there are now two functions before "(potentially by processes other than the calling process)", I would change it to "(potentially by processes other than the calling process in the case of MPI_DIST_GRAPH_CREATE)" just to be clear.

Otherwise looks good.

mpiforumbot commented 8 years ago

Originally by htor on 2009-04-06 22:18:46 -0500

Replying to rsthakur:

In the 2nd para of defn of the adjacent version, it should be "The call to MPI_DIST_GRAPH_CREATE_ADJACENT is collective." (instead of MPI_DIST_GRAPH_CREATE) fixed

The whole paragraph beginning with "Weights are specified as non-negative integers" is repeated in the two routines, whereas for reorder and info, it says see the next routine. You could keep even the weights description in one place and add it to the see the next routine list. it's actually slightly different in both cases.

process "contributes" edges has crept back in in the defn of MPI_Dist_graph_create in both places. you could change it to "specifies" edges. fixed :)

Since there are now two functions before "(potentially by processes other than the calling process)", I would change it to "(potentially by processes other than the calling process in the case of MPI_DIST_GRAPH_CREATE)" just to be clear. yes, fixed

Rolf, Hubert (thanks a lot!) and me did a thorough review of the ticket tonight. We applied some more minor changes (mostly constants in all the constant tables etc.). Otherwise, it looks good. Btw., please deprecate the C++ bindings asap!

Please take a quick look at the changes (use the diff at the ticket).

Thanks & all the Best, Torsten

mpiforumbot commented 8 years ago

Originally by jsquyres on 2009-04-06 22:27:58 -0500

I do believe it's a bug in Trac that there was an un-terminated "superscript" much earlier in this ticket that was wonking the font. Let's try to terminate it here: ^foo.

mpiforumbot commented 8 years ago

Originally by jsquyres on 2009-04-06 22:30:19 -0500

Maybe I should have tried an unterminated ,,subscript.

mpiforumbot commented 8 years ago

Originally by jsquyres on 2009-04-06 22:31:16 -0500

Maybe one more unterminated subscript to match the 2 unterminated ,,superscripts?

Let's see...

mpiforumbot commented 8 years ago

Originally by htor on 2009-04-06 22:35:13 -0500

some more line breaks in the code sections to make it printable for Rolf :) (no real other changes - stupid trac)

mpiforumbot commented 8 years ago

Originally by RolfRabenseifner on 2009-04-06 22:43:51 -0500

looks fine.

mpiforumbot commented 8 years ago

Originally by traff on 2009-04-07 02:19:59 -0500

Reviewed, ok, I hope (some significant changes has happened, not happy with all of them, but I appreciate that the ticket has been so thoroughly discussed)

Jesper

mpiforumbot commented 8 years ago

Originally by jsquyres on 2009-04-07 07:36:47 -0500

Sorry for jumping in so late; it was impossible to keep up with all the changes on this ticket.

p243:8 sentence is missing a comma: "... For MPI_DIST_GRAPH_CREATE_ADJACENT and MPI_DIST_GRAPH_CREATE, the input communication graph is distributed..."
New section 7.5.4 has this sentence:

MPI_DIST_GRAPH_CREATE_ADJACENT creates a distributed graph communicator with each process specifying all incoming and outgoing (adjacent) edges in the logical communication graph and thus requires minimal communication during creation.

Why is "(adjacent)" in parentheses? Is it required that the edges be connected to the node representing this process in the graph? If so, then be explicit.

New section 7.5.4 -- this is one gigantor sentence: To provide better possibilities for optimization by the MPI library, the distributed graph constructors permit weighted communication edges, and take an info argument that can further influence process reordering or other optimizations performed by the MPI library, for instance by hinting on how the edge weights are to be interpreted, on the quality of the reordering, and/or the time permitted for the MPI library to process the graph.[[BR]] I suggest breaking it up thusly:[[BR]] To provide better possibilities for optimization by the MPI library, the distributed graph constructors permit weighted communication edges, and take an info argument that can further influence process reordering or other optimizations performed by the MPI library. ~~for instance by hinting ~~ For example, hints can be provided on how the edge weights are to be interpreted, on the quality of the reordering, and/or the time permitted for the MPI library to process the graph.
Do isolated processes pass NULL arrays / 0 values for the length?
In new section 7.5.4: "The size of comm_dist_graph is identical to the size of comm_old." That sounds weird -- it almost sounds like we're assigning the size of the data structure. Instead, should you say "The number of processes in comm_dist_graph is identical to the number of processes in comm_old."? Same for both new graph constructors.
Missing word and extra comma: "However, the exact meaning of edge weights is not specified by the MPI standard, and is left to the implementation." Same for both new graph constructors.
This statement is not accurate because there is no C++ equivalent: "It is erroneous to supply MPI_UNWEIGHTED for some but not all processes' sourceweights and destweights arrays in comm_old." Perhaps the following would be better "Either zero or all processes must specify no sourceweights and destweights; it is erroneous for only some processes to specify no sourceweights and destweights."
Typos: "Info arguments can be used to guide the mapping,; possible options ~~are for example~~ include minimizing the maximum number of edges between processes on different SMP nodes ~~(for SMP systems)~~, or minimizing the sum of all such edges."
Why are there 2 ATI's in a row? They should be merged into 1 ATI, even if that one ATI addresses multiple issues.
"However, all processes can ~~easily~~ construct the full topology from the distributed..."
"In both cases above, the ~~user~~ application could supply MPI_UNWEIGHTED instead of explicitly providing identical weights. "
C++ binding mistake:

#!c
void MPI::Distgraphcomm::Get_dist_neighbors_count(int rank, int indegree[], 
    int outdegree[], bool *weighted) const

should be (change bool * to bool &):

#!c
void MPI::Distgraphcomm::Get_dist_neighbors_count(int rank, int indegree[], 
    int outdegree[], bool &weighted) const

Missing space: "The number of edges in to and out of the process returned by ..."
Comma should be a period: "Communication is required at the collective MPI_DIST_GRAPH_CREATE call in order to compute the neighbor lists for each process from the distributed graph specification,. "
Missed addition: Change p456:1-2: There are ~~five~~ six different types of communicators: MPI::Comm, MPI::Intercomm, MPI::Intracomm, MPI::Cartcomm, ~~and~~ MPI::Graphcomm, and MPI::Distgraphcomm."
There's a "Replace..." with no text specified to replace (...actually, it looks like some text got cut-n-pasted out of order in the proposal -- the following text is better because there is a missing comma after "Graphcomm" in the above text): Page 457, line 34, replace:[[BR]] // Cartcomm and Graphcomm are similarly defined[[BR]] with[[BR]] // Cartcomm, Graphcomm, and Distgraphcomm are similarly defined
The old graph interface should definitely be deprecated so that it can be removed in 3.0.

mpiforumbot commented 8 years ago

Originally by bronis on 2009-04-07 08:06:44 -0500

Replying to jsquyres:

Sorry for jumping in so late; it was impossible to keep up with all the changes on this ticket.

[snip]

Good comments Jeff even if they continue the convergence problem for this ticket. I would assert that your suggestions should not reset reviews on this ticket. Anyway, I disagree with the following:

Missing space: "The number of edges in to and out of the process returned by ..."

The word is "into" -- see:

http://www.merriam-webster.com/dictionary/into

mpiforumbot commented 8 years ago

Originally by htor on 2009-04-07 08:33:57 -0500

Replying to jsquyres:

Sorry for jumping in so late; it was impossible to keep up with all the changes on this ticket.

p243:8 sentence is missing a comma: "... For MPI_DIST_GRAPH_CREATE_ADJACENT and MPI_DIST_GRAPH_CREATE, the input communication graph is distributed..."

New section 7.5.4 has this sentence: fixed

MPI_DIST_GRAPH_CREATE_ADJACENT creates a distributed graph communicator with each process specifying all incoming and outgoing (adjacent) edges in the logical communication graph and thus requires minimal communication during creation.

Why is "(adjacent)" in parentheses? Is it required that the edges be connected to the node representing this process in the graph? If so, then be explicit. "all incoming and outgoing edges" \equiv "all adjacent edges" the first term is for CS people, the second for mathematicians but they say the same thing.

New section 7.5.4 -- this is one gigantor sentence: To provide better possibilities for optimization by the MPI library, the distributed graph constructors permit weighted communication edges, and take an info argument that can further influence process reordering or other optimizations performed by the MPI library, for instance by hinting on how the edge weights are to be interpreted, on the quality of the reordering, and/or the time permitted for the MPI library to process the graph.[[BR]] I suggest breaking it up thusly:[[BR]] To provide better possibilities for optimization by the MPI library, the distributed graph constructors permit weighted communication edges, and take an info argument that can further influence process reordering or other optimizations performed by the MPI library. ~~for instance by hinting ~~ For example, hints can be provided on how the edge weights are to be interpreted, on the quality of the reordering, and/or the time permitted for the MPI library to process the graph. fixed

Do isolated processes pass NULL arrays / 0 values for the length? yes, of course. Like a send with a zero count, i.e., input arrays are not significant.

In new section 7.5.4: "The size of comm_dist_graph is identical to the size of comm_old." That sounds weird -- it almost sounds like we're assigning the size of the data structure. Instead, should you say "The number of processes in comm_dist_graph is identical to the number of processes in comm_old."? Same for both new graph constructors. changed

Missing word and extra comma: "However, the exact meaning of edge weights is not specified by the MPI standard, and is left to the implementation." Same for both new graph constructors. changed

This statement is not accurate because there is no C++ equivalent: "It is erroneous to supply MPI_UNWEIGHTED for some but not all processes' sourceweights and destweights arrays in comm_old." Perhaps the following would be better "Either zero or all processes must specify no sourceweights and destweights; it is erroneous for only some processes to specify no sourceweights and destweights." fixed

Typos: "Info arguments can be used to guide the mapping,; possible options ~~are for example~~ include minimizing the maximum number of edges between processes on different SMP nodes ~~(for SMP systems)~~, or minimizing the sum of all such edges." changed

Why are there 2 ATI's in a row? They should be merged into 1 ATI, even if that one ATI addresses multiple issues. fixed

"However, all processes can ~~easily~~ construct the full topology from the distributed..." ok

"In both cases above, the ~~user~~ application could supply MPI_UNWEIGHTED instead of explicitly providing identical weights. " fixed

C++ binding mistake:
#!c
void MPI::Distgraphcomm::Get_dist_neighbors_count(int rank, int indegree[], 
    int outdegree[], bool *weighted) const
should be (change bool * to bool &):
#!c
void MPI::Distgraphcomm::Get_dist_neighbors_count(int rank, int indegree[], 
    int outdegree[], bool &weighted) const
please deprecate them - they're inconsistent and useless

Missing space: "The number of edges in to and out of the process returned by ..." see Bronis' comment

Comma should be a period: "Communication is required at the collective MPI_DIST_GRAPH_CREATE call in order to compute the neighbor lists for each process from the distributed graph specification,. " fixed

Missed addition: Change p456:1-2: There are ~~five~~ six different types of communicators: MPI::Comm, MPI::Intercomm, MPI::Intracomm, MPI::Cartcomm, ~~and~~ MPI::Graphcomm, and MPI::Distgraphcomm." added - good catch

There's a "Replace..." with no text specified to replace (...actually, it looks like some text got cut-n-pasted out of order in the proposal -- the following text is better because there is a missing comma after "Graphcomm" in the above text): Page 457, line 34, replace:[[BR]] // Cartcomm and Graphcomm are similarly defined[[BR]] with[[BR]] // Cartcomm, Graphcomm, and Distgraphcomm are similarly defined fixed!

The old graph interface should definitely be deprecated so that it can be removed in 3.0. yes, but that's not my decision :)

Updated proposal - Thanks, Torsten

mpiforumbot commented 8 years ago

Originally by htor on 2009-04-07 08:52:03 -0500

Attachment added: virtual_graph_adjacent.c (0.7 KiB) adjacency implementation

mpiforumbot commented 8 years ago

Originally by moody20 on 2009-04-07 09:02:32 -0500

This statement sounds ambiguous -- it sounds like each process has to specify all edges in the graph:

"MPI_DIST_GRAPH_CREATE_ADJACENT creates a distributed graph communicator with each process specifying all incoming and outgoing ..."

maybe just insert "of its" in there?

"... specifying all of its incoming and outgoing ..."

mpiforumbot commented 8 years ago

Originally by htor on 2009-04-07 09:12:00 -0500

added "of its" - review counter not reset!

mpiforumbot commented 8 years ago

Originally by hubertritzdorf on 2009-04-07 09:13:09 -0500

Reviewed. OK for me

mpiforumbot commented 8 years ago

Originally by htor on 2009-04-07 09:43:05 -0500

changed to reviewed after multiple reviews (counter was not reset by small changes as proposed by Bronis).

mpiforumbot commented 8 years ago

Originally by moody20 on 2009-04-07 10:01:30 -0500

I know we've changed "legal" to "valid" in other places... we should do that here I suppose, as well.

"An MPI implementation is not obliged to follow specific hints, and it is legal for an MPI implementation not to do any reordering."

mpiforumbot commented 8 years ago

Originally by jsquyres on 2009-04-07 10:04:03 -0500

I would highly encourage cutting-n-pasting the "deprecate this" portions of this ticket as a new ticket and getting it reviewed today. It's very easy to do; 9/10 of the work is already done.

I would do it myself, but I have a dozen other tickets I'm trying to shepherd through...

mpiforumbot commented 8 years ago

Originally by moody20 on 2009-04-07 10:21:22 -0500

In the adjacent case, it's clear that MPI_UNWEIGHTED can be used for both weight arrays, but it's not clear that it must be used for both arrays (can I use MPI_UNWEIGHTED for just the sources). I think this detail can be cleared up with some simple rephrasing.

Current text:

"In C or Fortran, an application can supply the special value MPI_UNWEIGHTED for both weight arrays to indicate that all edges have the same (effectively no) weight. In C++, this constant does not exist and the weight arguments may be omitted from the argument list. It is erroneous to supply MPI_UNWEIGHTED, or in C++ omit the weight arrays, for some but not all processes' sourceweights and destweights arrays in comm_old."

New suggestion:

"In C or Fortran, an application can indicate that all edges have the same (effectively no) weight by specifying the special value MPI_UNWEIGHTED for both weight arrays at all processes. In C++, this can be accomplished by omitting both weight arrays from the call at all processes. It is erroneous to supply MPI_UNWEIGHTED, or in C++ to omit the weight arrays, at some but not all processes in comm_old."

mpiforumbot commented 8 years ago

Originally by rsthakur on 2009-04-07 10:37:17 -0500

Just noticed a couple of typos. In the new Example 7.3, MPI_GRAPH_CREATE and MPI_GRAPH_CREATE_ADJACENT should be MPI_DIST_GRAPH_CREATE and MPI_DIST_GRAPH_CREATE_ADJACENT

Also 3rd line below defn of MPI_Dist_graph_neighbors, MPI_GRAPH_CREATE_ADJACENT should be MPI_DIST_GRAPH_CREATE_ADJACENT.

Maybe it is worth doing a search to make sure there are no more.

mpiforumbot commented 8 years ago

Originally by traff on 2009-04-07 11:26:53 -0500

Typos that Rajeev's detected should be fixed; I'm not doing it now for fear of conflict. Otherwise, ok with me (hard to read this text anymore)

mpiforumbot commented 8 years ago

Originally by htor on 2009-06-01 17:41:47 -0500

Fixed the typos mentioned by Rajeev (see diff)! Thanks and sorry for the delay.

mpiforumbot commented 8 years ago

Originally by traff on 2009-06-01 17:57:22 -0500

Missed entry for change log: MPI_DIST_GRAPH_CREATE_ADJACENT

mpiforumbot commented 8 years ago

Originally by RolfRabenseifner on 2009-06-07 16:18:38 -0500

All Change-Log entries must be of the form:

Location.NEWLINE[[BR]][[BR]] Text.

Therefore, I propose the following formal change (with same content as yours)

Section 7.5.3 on page 247.[[BR]][[BR]] New functions for a scalable distributed graph topology interface have been added. In this section, the functions MPI_DIST_GRAPH_CREATE_ADJACENT and MPI_DIST_GRAPH_CREATE, the constant MPI_UNWEIGHTED, and the derived C++ class Distgraphcomm were added.

Section 7.5.4 on page 252.[[BR]][[BR]] For the scalable distributed graph topology interface, the functions MPI_DIST_NEIGHBORS_COUNT and MPI_DIST_NEIGHBORS and the constant MPI_DIST_GRAPH were added.

mpi-forum / mpi-forum-historic

Fix Scalability Issues in Graph Topology Interface #33

Fix Scalability Issues in Graph Topology InterfaceAuthor: Torsten Hoefler, Jesper Larsson Traeff with comments from Rolf

Status

Description

History

Proposed Solution

Functions to Deprecate (not included in this ticket but kept for reference!)

Possible additional functionality (not included in this ticket but kept for future reference)

Impact on Implementations

Impact on Applications / Users

Alternative Solutions

Entry for the Change Log