Closed upperwal closed 6 years ago
Elegant solution is to use MPI_Group_translate_ranks
function.
world_job_comm
added to Node
struct
typedef struct Nodes {
int job_id;
int rank;
int age;
int jobs_count;
enum NodeTransitState node_transit_state;
enum NodeCheckpointMaster node_checkpoint_master;
MPI_Comm rep_mpi_comm_world; // Duplicate of MPI_COMM_WORLD
MPI_Comm world_job_comm; // Communicator to all nodes in a job.
MPI_Comm active_comm; // Communicator of nodes, one from each job. So these can be called active nodes.
} Node;
global job communicator will be among nodes in a single job, could be used to communicate among node within a job.
Problem: MPI_COMM_WORLD.nodeID -> global_job_comm.nodeID mapping is not avaliable.
Possible solution: MPI_Comm_set_attr in MPI_COMM_WORLD to store global_job_comm.nodeID and visa versa.