cniethammer commented 5 years ago

Problem

Many scientific applications perform computations on a Cartesian grid. The common approach for the parallelization of these applications with MPI is domain decomposition. To help developers with the mapping of MPI processes to subdomains, the MPI standard provides the concept of process topologies. However, the current interface shows problems and requires too much care in its usage: So, MPI_Dims_create does neither take into account the application nor the hardware topology and most implementations of MPI_Cart_create can only partially fix these problems [Gropp]. However, the mapping has a huge influence on communication performance on multi node NUMA systems.

Proposal

A set of two new API functions is proposed, which takes into account the computational grid as well as the hardware topology. These functions are MPI_CART_CREATE_WEIGHTED, and MPI_DIMS_CREATE_WEIGHTED.

The new API allows the efficient mapping of processes for applications performing computations on a Cartesian grid over multiple NUMA layers. For details see the material from the Euro MPI 2018 and 2019 contribution of Niethammer and Rabenseifner: The abstract [Niethammer], slides and implementation example can be found under https://fs.hlrs.de/projects/par/mpi/EuroMPI2018-Cartesian/

The software documentation of the example implementation together with a benchmark see https://fs.hlrs.de/projects/par/mpi//EuroMPI2018-Cartesian/EuroMPI2018_Niethammer+Rabenseifner_ML-Cartesian_software-docu.pdf

The new functions support equally and not equally balanced factorizations of the number of processes of comm_old into n dimensions. Moreover, MPI_CART_CREATE_WEIGHTED provides a reordering of the ranks and allows further optimizations applied to the factorization, taking into account both, application information through the dim_weights argument and hardware information through the comm_old input argument.

Changes to the Text

The addition of the following two API functions to the Process Topology chapter: MPI_CART_CREATE_WEIGHTED and MPI_DIMS_CREATE_WEIGHTED, Together with new section "Cartesian Examples" that explains two use cases for the new routines.

Latest annotated pdf:

Versions for the reading on Feb. 2020 in Portland: A with MPI_Count dim_weights:

mpi-report-issue120-topol-2020-02-04-annotated.pdf Status: Had reading in Portland

Latest version from Portland WG and plenary discussions:

mpi-report-issue120-topol-2020-06-14.pdf (without annotation)
- mpi-report-issue120-topol-2020-06-14-annotated.pdf Status: Reading of these last changes (Portland-changes in green) is scheduled for the exceptional virtual meeting, June 2020 (and includes also a few last corrections from June 09-14, 2020 in blue)

Impact on Implementations

Implementations have to provide 2 new API functions. No changes to existing MPI functions. Portable Open Source software solution exists.

Impact on Users

Users benefit from improved performance on NUMA systems.

References

[Gropp] W. D. Gropp, Using Node [and Socket] Information to Implement MPI Cartesian Topologies, Parallel Computing, 2019, and in: Proceedings of the 25th European MPI User' Group Meeting, EuroMPI'18, ACM, New York, NY, USA, 2018, pp. 18:1-18:9. doi:10.1145/3236367.3236377. Slides: http://wgropp.cs.illinois.edu/bib/talks/tdata/2018/nodecart-final.pdf

[Niethammer2018] Topology aware Cartesian grid mapping with MPI Christoph Niethammer and Rolf Rabenseifner, Poster at Euro MPI 2018 https://fs.hlrs.de/projects/par/mpi//EuroMPI2018-Cartesian/EuroMPI2018_Niethammer+Rabenseifner_ML-Cartesian_e-abstract.pdf and https://fs.hlrs.de/projects/par/mpi/EuroMPI2018-Cartesian/

[Niethammer2019] Christoph Niethammer and Rolf Rabenseifner. An MPI interface for application and hardware aware Cartesian topology optimization. In Proceedings of the 26th European MPI Users' Group Meeting (EuroMPI 2019), September 11-13, 2019, Zürich, Switzerland. ACM, New York, NY, USA, 9 pages.

A set of slides explaining the interface: MPIX_2019-03-Chattanooga_v06.pdf

Introductory slides for the reading at the virtual meetings June 10 and June 29-July 1, 2020: MPIX_2020-06-10-Munich-v03.pdf

Related Pull Request

https://github.com/mpi-forum/mpi-standard/pull/98

RolfRabenseifner commented 5 years ago

topol-2019-02-17_20.08.pdf uploaded

cniethammer commented 5 years ago

pdf version of the final draft including marked changes: mpi32-report-topol-2019-02-18-annotated.pdf

RolfRabenseifner commented 5 years ago

pdf version of the final draft including marked changes (additional marks --> version 2): mpi32-report-topol-2019-02-18-annotated-2.pdf

RolfRabenseifner commented 5 years ago

Slides to explain the new interface / for the reading: MPIX_2019-03-Chattanooga_v06.pdf

RolfRabenseifner commented 5 years ago

I tried to include most of the results of the discussions at the Chattanooga meeting in March and from the HW topology working group telcons into the following update. It is also the input for the virtual meeting today (Wed., April 24, 2019): mpi32-report-topol-2019-04-23-annotated.pdf

RolfRabenseifner commented 5 years ago

I updated the annotations: mpi32-report-topol-2019-04-23-annotated-2.pdf

RolfRabenseifner commented 5 years ago

I added (nearly) all of the English corrections from Jeff Squyres and all is also available in the pull request: mpi32-report-topol-2019-04-24-annotated.pdf

It is the base for the virtual meeting today (April 24, 2019)

RolfRabenseifner commented 5 years ago

Latest version for the reading at the WG telcon May 3, 2019: mpi32-report-topol-2019-05-03-annotated.pdf

RolfRabenseifner commented 5 years ago

The following proposal is prepared for the formal reading at the Chicago meeting: mpi32-report-topol-2019-05-12-annotated.pdf

RolfRabenseifner commented 5 years ago

Based on the reading in Chicago, I prepared a new and smaller version for the formal reading in Zürich: (file removed)

RolfRabenseifner commented 5 years ago

A few additional bugs removed - new version: mpi32-report-topol-2019-08-12-annotated.pdf

RolfRabenseifner commented 5 years ago

New version, including the correction from Guillaume and results of the virtual meeting from Aug. 14, 2019: mpi32-report-topol-2019-08-17-annotated.pdf

RolfRabenseifner commented 5 years ago

120_mpi32-report-topol-2019-08-17-annotated_corr2019-09-05.pdf Including some additional typo corrections from the HW topology WG meeting Sep. 5, 2019 at the MPI forum meeting in Zurich; the corrections were added to the pull request.

RolfRabenseifner commented 4 years ago

I changed the logic:
- MPI_CART_CREATE_WEIGHTED and
- MPI_DIMS_CREATE_WEIGHTED have the goal to provide factorization with the products dim_weights[i] * dims[i] "as close to each other as possible, using an appropriate divisibility algorithm" which is the same wording as for MPI_DIMS_CREATE.
MPI_CART_CREATE_WEIGHTED has still this additional Feature "In the case of a multi-level hierarchical hardware, the user of this routine can further constrain the factorization and the reordering of the process ranks with the info Argument by choosing split levels which should mark slower to faster connectivity." And then the already used text with MPI_COMM_TYPE_SHARED and MPI_COMM_TYPE_HW_SUBDOMAIN still using the two large figures with "48 processes" and "24 shared memory nodes".
A new completely new subsection "Cartesian Examples"
- Example 7.3 with dims[i] ratios 2:5:3 (no halo communic.) (0.5 pages)
- Example 7.4 with halo communication and detailed description how to define the appropriate dim_weights[i] (1 page + 0.5 pages for the other two figures)

Here the new pdf: mpi32-report-topol-2019-11-06-annotated.pdf

RolfRabenseifner commented 4 years ago

Small updates based on the reading at the HW topol WG telcon Nov. 6, 2019: mpi32-report-topol-2019-11-07-annotated.pdf Caution: The pull request 98 still reflects the old version from Aug. 17, 2019

RolfRabenseifner commented 4 years ago

New version for the reading in Albuqueque: mpi32-report-topol-2019-11-07-annotated-corr-11-21.pdf The corresponding pull request is also updated.

wesbland commented 4 years ago

This failed the no-no vote in Albuquerque, New Mexico on 2019-12-12.

RolfRabenseifner commented 4 years ago

First version for Re-reading in Portland, Feb. 2020 Major changes:

defining weights now with MPI_Count integers rather than double;
using them as absolute values;
defining optimization goal through communication pattern via MPI_NEIGHBOR_ALLTOALL rather than sum of weights[i]*dims[i] should be minimized;
allowing therefore absolute instead of relative weights, and therefore the MPI library can differentiate between latency and bandwidth optimization
re-added a halo example.

Latest annotated pdf: mpi-report-issue120-topol-2020-02-01-annotated.pdf

It is based on

the official reading version of Albuquerque,
plus the changes from the meeting (marked with green) (no-no vote failed because of missing hale example)
plus the new changes (marked with blue) (now with better and shorter descibed halo examle)

Shinji-Sumimoto commented 4 years ago

Rolf, Thank you for changing your proposal.

In my computer API(Application Program Interface) definition policy, API should be understandable and used without any knowledge, at least, users should be able to write sample program and run after only reading the specification.

Based on the polity, current proposal become acceptable in minimum. However, the paragraphs are still confusing me.

Here are some comments about mpi32-report-topol-2020-02-01-annotated.pdf Page.315.319: Figure 7.1 and 7.2 have become much better to understand intuitively. But, still confusing, the suffix number of each definition g, h w, and d seems to a dimension number, but not easy to understand. Also dimension direction(0, 1, 2) arrows should be added or changed from the numbers to xyz character.

Page.318: it is not clear the index value i, it seems to dimension index for dim_weight[] and dims[]. Because of character i is the most common character in computer and needs to add definition "0 <= i < ndims" or "i=0::ndims " in whole in the related sections.

Page.319: Example 7.3, index value of i can be imagined that i is the dimensional value. A little bit confusing is which value of dims[i] in dim_weights[i].dims[i] formula in the table. In the description of dims in the first column should be dims[i]i=0::2 for better understanding.

Best regards, Shinji.

RolfRabenseifner commented 4 years ago

Dear Shinji,

I hope that I resolved all your proposals. Your view-point helped me to look for better readability.

Here the new version for the re-reading in Portland:

mpi-report-issue120-topol-2020-02-03-annotated.pdf

Thank you very much and best regards Rolf

RolfRabenseifner commented 4 years ago

PR98 is also uptated.

RolfRabenseifner commented 4 years ago

Small updates based on Artem's review (see pull request). New pdf: mpi-report-issue120-topol-2020-02-04-annotated.pdf

RolfRabenseifner commented 4 years ago

Originally, for the reading in Albuquerque, we had double precision dim_weights. Now (in version A), they ar changed to MPI_Count, because in MPI_Cart_create_weighted they count bytes and possibly huge, because they represent the whole system and not only one MPI process.

PDF for version A is as above: mpi-report-issue120-topol-2020-02-04-annotated.pdf

For MPI_Dims_create_weighted, they have no such absolute value meaning: they are just realtive factors. Therefore, I propose as version B, to go back to double precision dim_weights. This does not change example 7.2, but simplifies example 7.3.

PDF for version B: mpi-report-issue120-topol-2020-02-04-B-annotated.pdf

RolfRabenseifner commented 4 years ago

mpi-report-issue120-topol-2020-02-20-annotated.pdf

Latest version from Portland WG meeting and plenary discussions.

RolfRabenseifner commented 4 years ago

Updated version for the no-no-vote on Fab. 21, 2020 in Portland: All small changes to the version from Feb. 4, 2020. mpi-report-issue120-topol-2020-02-20b-annotated.pdf

RolfRabenseifner commented 4 years ago

Status: The changes from version Feb. 4, 2020 to version Feb. 21, 2020 still need a no-no-vote before the reading of version Feb. 21, 2020 is finished.

wesbland commented 4 years ago

The no-no reading at the February 2020 meeting was withdrawn. This will require a no-no vote on the text read by @RolfRabenseifner at the June 2020 meeting before its first vote.

RolfRabenseifner commented 4 years ago

Same version as from the end from the portland meeting, plus a few typo corrections ...:

mpi-report-issue120-topol-2020-02-21-annotated.pdf

(I added the corrections to the rull request and also to the annotated pdf version from Feb 21, 2020)

RolfRabenseifner commented 4 years ago

Same version as from the end from the portland meeting, plus another small corrections ...:

mpi-report-issue120-topol-2020-02-21-annotated.pdf

Here also an introductory set of slides for the reading of the last changes: MPIX_2020-06-10-Munich-v03.pdf

cniethammer commented 4 years ago

The new MPI_Cart_create_weighted function has dims as an output parameter. This requires to adapt MPI_Cart_get which is currently supposed to return the value provided in the call to MPI_Cart_create. This was not a problem so far, as the value is not changed by this call.

Suggestion to fix this issue in https://github.com/mpi-forum/mpi-standard/pull/98/commits/62d049c6b4a5ac68755a5f6cea7a9b741d899da3.

RolfRabenseifner commented 4 years ago

With this last change, we produced a new annotated pdf: mpi-report-issue120-topol-2020-06-14.pdf It contains the changes from Portland (in green) plus a very few changes until now (in blue).

raffenet commented 4 years ago

With this last change, we produced a new annotated pdf: mpi-report-issue120-topol-2020-06-14.pdf It contains the changes from Portland (in green) plus a very few changes until now (in blue).

I do not see any annotations in the PDF when I view it. Am I missing something?

wesbland commented 4 years ago

This is the diff of everything since the Portland meeting. Is that what you're intending to be the no-no reading?

RolfRabenseifner commented 4 years ago

With this last change, we produced a new annotated pdf: mpi-report-issue120-topol-2020-06-14.pdf It contains the changes from Portland (in green) plus a very few changes until now (in blue).

I do not see any annotations in the PDF when I view it. Am I missing something?

My apologies, that I uploaded the plain file instead of the annotated one. You were the first one detecting this. Here the really annotated one:

mpi-report-issue120-topol-2020-06-14-annotated.pdf

RolfRabenseifner commented 4 years ago

This is the diff of everything since the Portland meeting. Is that what you're intending to be the no-no reading?

You are right. I also expect that this may be more as for a no-no-vote for some membersa of the forum. Martin and you had setup in the meeting agenda only a "reading" when I looked first on it. This was fine for me. Now, it shows a "NoNo Reading". Plus the votes. If someone of the forum would not agree with no-no-vote, then I would proceed with the voting in the next meeting (same as with #96) and would take this reading as a normal reading. I'll do the reading with the slides plus the annotated pdf.

raffenet commented 4 years ago

Any comment about this feedback? https://github.com/mpi-forum/mpi-standard/pull/98#discussion_r468020783

dholmes-epcc-ed-ac-uk commented 4 years ago

Any comment about this feedback? mpi-forum/mpi-standard#98 (comment)

I believe @RolfRabenseifner was on vacation and may not have seen/read these comments yet.

Personally, I agree with ANL's objection for all the reasons noted in Ken's comment and those in Bill's reply.

@raffenet & @wgropp & others: the key question is - will you vote against this ticket because of this objection? We should not accept things into the MPI Standard that are normative but unimplementable.

@wesbland: I guess there is still time, procedurally, for @RolfRabenseifner to prepare a "no-no-vote" change before the 1st vote - how many hours remain?

wgropp commented 4 years ago

Yes, I would vote against this if it includes this language. However, because of three separate conflicts, I won’t be able to attend and vote this week. I would support a no-no-vote change for this. If it passes anyway, I strongly urge an errata item to correct this ASAP.

Bill

William Gropp Director and Chief Scientist, NCSA Thomas M. Siebel Chair in Computer Science University of Illinois Urbana-Champaign

On Aug 17, 2020, at 7:50 AM, Dan Holmes notifications@github.com wrote:

Any comment about this feedback? mpi-forum/mpi-standard#98 (comment) https://github.com/mpi-forum/mpi-standard/pull/98#discussion_r468020783 I believe @RolfRabenseifner https://github.com/RolfRabenseifner was on vacation and may not have seen/read these comments yet.

Personally, I agree with ANL's objection for all the reasons noted in Ken's comment and those in Bill's reply.

@raffenet https://github.com/raffenet & @wgropp https://github.com/wgropp & others: the key question is - will you vote against this ticket because of this objection? We should not accept things into the MPI Standard that are normative but unimplementable.

@wesbland https://github.com/wesbland: I guess there is still time, procedurally, for @RolfRabenseifner https://github.com/RolfRabenseifner to prepare a "no-no-vote" change before the 1st vote - how many hours remain?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mpi-forum/mpi-issues/issues/120#issuecomment-674861711, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJFGZXBHN3ZGV5AVYEPBHLSBERQ3ANCNFSM4GXYOEFA.

wesbland commented 4 years ago

@wesbland: I guess there is still time, procedurally, for @RolfRabenseifner to prepare a "no-no-vote" change before the 1st vote - how many hours remain?

Procedurally, no-no votes can be announced any time up to the time where we vote on this item (which is currently schedule for tomorrow (Tuesday) at the earliest. So a change could be made sometime today and read tomorrow.

RolfRabenseifner commented 4 years ago

Yes, I haven't seen it due to my vacation. Thank you for remembering me.

RolfRabenseifner commented 4 years ago

Late change for no-no-vote on Aug 18,, 2020: Minimizing communication time as advice to implementors for high quality MPI implementations: mpi-report-issue120-topol-2020-08-17-annotated-pages334-335.pdf

RolfRabenseifner commented 4 years ago

Dear Bill, Pavan, Ken and Dan,

thank you for your review and proposal.

I put the text into an advice to implementors and as "A high quality..."

I hope, that it is now okay:

https://github.com/mpi-forum/mpi-issues/files/5086553/mpi-report-issue120-topol-2020-08-17-annotated-pages334-335.pdf

The changes are highlighted on pages 334+335.

Best regards Rolf

----- Original Message -----

From: "github notifications" notifications@github.com To: "mpi-forum" mpi-issues@noreply.github.com Cc: "Rolf Rabenseifner" rabenseifner@hlrs.de, "Mention" mention@noreply.github.com Sent: Monday, August 17, 2020 2:58:33 PM Subject: Re: [mpi-forum/mpi-issues] MPI_Cart_create_weighted / Topology aware Cartesian communicators (#120)

Yes, I would vote against this if it includes this language. However, because of three separate conflicts, I won’t be able to attend and vote this week. I would support a no-no-vote change for this. If it passes anyway, I strongly urge an errata item to correct this ASAP.

Bill

William Gropp

Director and Chief Scientist, NCSA

Thomas M. Siebel Chair in Computer Science

University of Illinois Urbana-Champaign

On Aug 17, 2020, at 7:50 AM, Dan Holmes notifications@github.com wrote:

Any comment about this feedback? mpi-forum/mpi-standard#98 (comment) https://github.com/mpi-forum/mpi-standard/pull/98#discussion_r468020783

I believe @RolfRabenseifner https://github.com/RolfRabenseifner was on vacation and may not have seen/read these comments yet.

Personally, I agree with ANL's objection for all the reasons noted in Ken's comment and those in Bill's reply.

@raffenet https://github.com/raffenet & @wgropp https://github.com/wgropp & others: the key question is - will you vote against this ticket because of this objection? We should not accept things into the MPI Standard that are normative but unimplementable.

@wesbland https://github.com/wesbland: I guess there is still time, procedurally, for @RolfRabenseifner https://github.com/RolfRabenseifner to prepare a "no-no-vote" change before the 1st vote - how many hours remain?

—

You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/mpi-forum/mpi-issues/issues/120#issuecomment-674861711, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJFGZXBHN3ZGV5AVYEPBHLSBERQ3ANCNFSM4GXYOEFA.

--

You are receiving this because you were mentioned.

Reply to this email directly or view it on GitHub:

https://github.com/mpi-forum/mpi-issues/issues/120#issuecomment-674865455

-- Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner@hlrs.de . High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 . University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 . Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner . Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .

wesbland commented 4 years ago

This passed a "no no" and did not meet ballot quorum for a first vote on 2020-08-19.

https://www.mpi-forum.org/meetings/2020/08/votes

RolfRabenseifner commented 1 year ago

The proposal is now re-based on mpi-4.x. After the long pause since the reading in 2020, I prepared an updated commented pdf based on current mpi-4.x: See yellow marker and PR98 marker on pages 367-378, 1012, 1032, 1035-1036 in mpi41-report_Issue120_PR98.pdf

cniethammer commented 1 year ago

Feedback during the March 2023 MPI Forum meeting:

Leave out the MPI_DIMS_CREATE_WEIGHTED with all the mathematics

wgropp commented 1 year ago

Also leave out the text added to the description of the behavior of MPI_DIMS_CREATE.

RolfRabenseifner commented 1 year ago

Without MPI_Dims_create AtoI and MPI_Dims_create_weighted (the agenda of changes): mpi41-report_Issue120_PR98_only-Cart_weighted.pdf

RolfRabenseifner commented 1 year ago

Without MPI_Dims_create AtoI and MPI_Dims_create_weighted (the final agenda of changes): mpi41-report_Issue120_PR98_only-Cart_weighted.pdf

The uncommented result pdf: mpi41-report_Issue120_PR98_only-Cart_weighted_result-uncommented.pdf

RolfRabenseifner commented 1 year ago

Result from the reading, see yellow marker and PR98 marker on pages pages 367-376, 1008, 1028, 1031-1032, 1048, 1069 in mpi41-report_Issue120_PR98_final.pdf

mpiforumbot commented 1 year ago

This passed a no-no vote.

Yes	No	Abstain
29	0	2

mpi-forum / mpi-issues

MPI_Cart_create_weighted / Topology aware Cartesian communicators #120

Problem

Proposal

Changes to the Text

Latest annotated pdf:

Impact on Implementations

Impact on Users

References

Related Pull Request