openpmix / prrte

PMIx Reference RunTime Environment (PRRTE)
https://pmix.org
Other
35 stars 67 forks source link

Unable to specify order of assigment with prterun --map-by :pe-list option #696

Closed drwootton closed 3 years ago

drwootton commented 3 years ago

Thank you for taking the time to submit an issue!

Background information

What version of the PMIx Reference Server are you using? (e.g., v1.0, v2.1, git master @ hash, etc.)

Master branch

What version of PMIx are you using? (e.g., v1.2.5, v2.0.3, v2.1.0, git branch name and hash, etc.)

Master branch

Please describe the system on which you are running


Details of the problem

If I run the command prterun -n 4 --hostfile hostfile8 --map-by core:PE-LIST=0,1,2,3 --bind-to core:report echo then I get output I expect

[c712f6n01:18908] MCW rank 0 bound to package[0][core:0]
[c712f6n01:18908] MCW rank 1 bound to package[0][core:1]
[c712f6n01:18908] MCW rank 2 bound to package[0][core:2]
[c712f6n01:18908] MCW rank 3 bound to package[0][core:3]

If I run the command prterun -n 4 --hostfile hostfile8 --map-by core:PE-LIST=3,2,1,0 --bind-to core:report echo I get the same output, where I expect something like

[c712f6n01:18908] MCW rank 0 bound to package[0][core:3]
[c712f6n01:18908] MCW rank 1 bound to package[0][core:2]
[c712f6n01:18908] MCW rank 2 bound to package[0][core:1]
[c712f6n01:18908] MCW rank 3 bound to package[0][core:0]

I think the ability to specify an arbitrary mapping is useful, such as when trying to place tasks for memory or cache affinity.

The problem is that when the set of resources specified in the PE-LIST option is handled as a set instead of a list, which means that it is impossible to retain any concept of ordering in the PE-LIST resources.

rhc54 commented 3 years ago

it is impossible to retain any concept of ordering in the PE-LIST resources.

Correct - because as the help output indicates, there is no concept of ordering in this option. If someone wants to do a different pattern, such as you suggest, then they have other ways of doing it - e.g., rankfile mapping. Changing this option to support what you "expected" would be disruptive.

I think we need to step back a bit and ask ourselves what we are trying to accomplish here. Are we trying to check that things work as intended? In that case, we shouldn't file an issue for this case - it is performing as intended. Your "expectation" was simply at odds with the documented behavior. It might indicate we need to clarify the docs (meaning we should file a documentation issue), but not merit changing the behavior.

Trying to modify/expand the cmd line options to meet someone's expectations would be a rather intensive task - I would have to question the value and reality of it. We could easily wind up chasing our tails as one person's expectations are unlikely to match those of another.

Just to be clear: I'm not questioning the value of testing all these options. I appreciate someone doing so. I'm only trying to understand the motivation for filing issues on things that are working as documented/intended.

drwootton commented 3 years ago

I started working on PRRTE with @jjhursey last summer and am in the process of learning the code. He asked me to do some testing with various combinations of map, bind, and rank options, which is the origin of the issues I have written. I'm not trying to push PRRTE in any specific direction, just trying to shake out bugs in the code.

I found cases which did not seem to be working right while I was testing. I discussed these with Josh and wrote issues only when he also thought the behavior might be wrong.

If what I think is a bug is something that PRRTE was clearly not intended to do, I'll accept that, or if it's a documentation cleanup problem I'll accept that too.

In this case, the only place I found any mention of the PE-LIST option was in the prterun --help output and in the draft documents in pull request #557. All that text says is that PE-LIST is a comma-delimited list of CPUs, it doesn't say anything about ordered or unordered lists. If there's other documentation that clarifies it, I don't know where that is.

If this is a documentation cleanup, that's fine. I realize changing PRRTE to pass around a list of CPUs with specific ordering is not a simple fix, but also don't know where it lands in terms of what's most important.

rhc54 commented 3 years ago

I'm aware of the history - like I said, I'm not questioning the work. I'm just still trying to figure out how to respond to this "issue flood". It seems like these fall into a few categories:

Probably more categories in there, but hopefully this provides some food for thought. What I'm ultimately looking for is a way to label these issues so we can more easily deal with them. If we can come up with a reasonably simple way of categorizing them, then we can create a label for each category and appropriately mark the issues - and it is okay to put multiple labels on the issue if it somewhat spans categories, though it would be preferred if we can contain that.

jjhursey commented 3 years ago

Dave and I have been testing based on the documentation we have - in the man pages, FAQs, and help output. Where it is ambiguous we have tried to test what we think it should do. Our goal is to test existing functionality, not necessarily invent new functionality or interpretations. We file tickets for any mismatch with the hope that we can dialog with the community on if it is a documentation issue, bug, invalid usage, or possibly a future feature.

I think in the case of this Issue we hit on a documentation clarification issue, and possibly a future feature. I added a note to fixe the documentation issue in this comment. I think we should re-classify this ticket as a future, possible enhancement request for a PR-ORDERED-LIST option that preserves the order specified. I agree that would take a fair amount of work, and the priority of the request would need to be based on demand. Right now we don't have a demand for it from what I can tell.

abouteiller commented 3 years ago

I was about to make my own issue about this, but I have also experienced 'surprising' mapping with combinations of --map-by --rank-by --bind-to options. In particular, I could not find a combination that replicates the good-old --bynode option from prior Open MPI versions.

For example, the following option set yielded a crash, that looks like a bug

salloc -N 4 -Ccauchy bin/mpiexec --rank-by node  --map-by :display --bind-to core     ~/ompi/benchmarks/imb/mpi-benchmarks/IMB-MPI1.u5 -npmin 32 Barrier
salloc: Granted job allocation 391603
salloc: Waiting for resource configuration
salloc: Nodes c[00-03] are ready for job

========================   JOB MAP   ========================
Data for JOB [3650,1] offset 0 Total slots allocated 32
    Mapping policy: BYPACKAGE:NOOVERSUBSCRIBE  Ranking policy: NODE Binding policy: CORE
    Cpu set: N/A  PPR: N/A  Cpus-per-rank: N/A  Cpu Type: CORE

    Data for node: c00  Num slots: 8    Max slots: 0    Num procs: 8
        Process jobid: [3650,1] App: 0 Process rank: 0 Bound: package[0][core:0]
        Process jobid: [3650,1] App: 0 Process rank: 4 Bound: package[1][core:4]
        Process jobid: [3650,1] App: 0 Process rank: 8 Bound: package[0][core:1]
        Process jobid: [3650,1] App: 0 Process rank: 12 Bound: package[1][core:5]
        Process jobid: [3650,1] App: 0 Process rank: 16 Bound: package[0][core:2]
        Process jobid: [3650,1] App: 0 Process rank: 20 Bound: package[1][core:6]
        Process jobid: [3650,1] App: 0 Process rank: 24 Bound: package[0][core:3]
        Process jobid: [3650,1] App: 0 Process rank: 28 Bound: package[1][core:7]

    Data for node: c01  Num slots: 8    Max slots: 0    Num procs: 8
        Process jobid: [3650,1] App: 0 Process rank: 1 Bound: package[0][core:0]
        Process jobid: [3650,1] App: 0 Process rank: 5 Bound: package[1][core:4]
        Process jobid: [3650,1] App: 0 Process rank: 9 Bound: package[0][core:1]
        Process jobid: [3650,1] App: 0 Process rank: 13 Bound: package[1][core:5]
        Process jobid: [3650,1] App: 0 Process rank: 17 Bound: package[0][core:2]
        Process jobid: [3650,1] App: 0 Process rank: 21 Bound: package[1][core:6]
        Process jobid: [3650,1] App: 0 Process rank: 25 Bound: package[0][core:3]
        Process jobid: [3650,1] App: 0 Process rank: 29 Bound: package[1][core:7]

    Data for node: c02  Num slots: 8    Max slots: 0    Num procs: 8
        Process jobid: [3650,1] App: 0 Process rank: 2 Bound: package[0][core:0]
        Process jobid: [3650,1] App: 0 Process rank: 6 Bound: package[1][core:4]
        Process jobid: [3650,1] App: 0 Process rank: 10 Bound: package[0][core:1]
        Process jobid: [3650,1] App: 0 Process rank: 14 Bound: package[1][core:5]
        Process jobid: [3650,1] App: 0 Process rank: 18 Bound: package[0][core:2]
        Process jobid: [3650,1] App: 0 Process rank: 22 Bound: package[1][core:6]
        Process jobid: [3650,1] App: 0 Process rank: 26 Bound: package[0][core:3]
        Process jobid: [3650,1] App: 0 Process rank: 30 Bound: package[1][core:7]

    Data for node: c03  Num slots: 8    Max slots: 0    Num procs: 8
        Process jobid: [3650,1] App: 0 Process rank: 3 Bound: package[0][core:0]
        Process jobid: [3650,1] App: 0 Process rank: 7 Bound: package[1][core:4]
        Process jobid: [3650,1] App: 0 Process rank: 11 Bound: package[0][core:1]
        Process jobid: [3650,1] App: 0 Process rank: 15 Bound: package[1][core:5]
        Process jobid: [3650,1] App: 0 Process rank: 19 Bound: package[0][core:2]
        Process jobid: [3650,1] App: 0 Process rank: 23 Bound: package[1][core:6]
        Process jobid: [3650,1] App: 0 Process rank: 27 Bound: package[0][core:3]
        Process jobid: [3650,1] App: 0 Process rank: 31 Bound: package[1][core:7]

=============================================================

[saturn:32036] PRTE ERROR: Fatal in file ../../../../../../master/3rd-party/prrte/src/mca/rmaps/base/rmaps_base_ranking.c at line 690
[saturn:32036] PRTE ERROR: Fatal in file ../../../../../../master/3rd-party/prrte/src/mca/odls/base/odls_base_default_fns.c at line 276
[saturn:32036] PRTE ERROR: Fatal in file ../../../../../../master/3rd-party/prrte/src/mca/plm/base/plm_base_launch_support.c at line 512
[saturn:32036] FORCE TERMINATE ORDERED AT ../../../../../../master/3rd-party/prrte/src/mca/plm/base/plm_base_launch_support.c:513 - error Unknown error(1)

But other combinations would do unexpected things silently (like ranking by package when requesting rank-by node, and other bizarre things).

rhc54 commented 3 years ago

I could not find a combination that replicates the good-old --bynode option from prior Open MPI versions.

I'm going to look at the crash, but the equivalent of bynode is simply --map-by node

jjhursey commented 3 years ago

This issue is an enhancement request to add a PR-ORDERED-LIST option to --map-by similar to the PR-LIST option, but honors the ordering of the options. Note below from earlier comment:

I think we should re-classify this ticket as a future, possible enhancement request for a PR-ORDERED-LIST option that preserves the order specified. I agree that would take a fair amount of work, and the priority of the request would need to be based on demand. Right now we don't have a demand for it from what I can tell.