Closed drwootton closed 3 years ago
it is impossible to retain any concept of ordering in the PE-LIST resources.
Correct - because as the help output indicates, there is no concept of ordering in this option. If someone wants to do a different pattern, such as you suggest, then they have other ways of doing it - e.g., rankfile mapping. Changing this option to support what you "expected" would be disruptive.
I think we need to step back a bit and ask ourselves what we are trying to accomplish here. Are we trying to check that things work as intended? In that case, we shouldn't file an issue for this case - it is performing as intended. Your "expectation" was simply at odds with the documented behavior. It might indicate we need to clarify the docs (meaning we should file a documentation issue), but not merit changing the behavior.
Trying to modify/expand the cmd line options to meet someone's expectations would be a rather intensive task - I would have to question the value and reality of it. We could easily wind up chasing our tails as one person's expectations are unlikely to match those of another.
Just to be clear: I'm not questioning the value of testing all these options. I appreciate someone doing so. I'm only trying to understand the motivation for filing issues on things that are working as documented/intended.
I started working on PRRTE with @jjhursey last summer and am in the process of learning the code. He asked me to do some testing with various combinations of map, bind, and rank options, which is the origin of the issues I have written. I'm not trying to push PRRTE in any specific direction, just trying to shake out bugs in the code.
I found cases which did not seem to be working right while I was testing. I discussed these with Josh and wrote issues only when he also thought the behavior might be wrong.
If what I think is a bug is something that PRRTE was clearly not intended to do, I'll accept that, or if it's a documentation cleanup problem I'll accept that too.
In this case, the only place I found any mention of the PE-LIST option was in the prterun --help output and in the draft documents in pull request #557. All that text says is that PE-LIST is a comma-delimited list of CPUs, it doesn't say anything about ordered or unordered lists. If there's other documentation that clarifies it, I don't know where that is.
If this is a documentation cleanup, that's fine. I realize changing PRRTE to pass around a list of CPUs with specific ordering is not a simple fix, but also don't know where it lands in terms of what's most important.
I'm aware of the history - like I said, I'm not questioning the work. I'm just still trying to figure out how to respond to this "issue flood". It seems like these fall into a few categories:
things are working correctly, but then you gave it some incorrect input and didn't realize it was incorrect. Question here is why you didn't realize it was incorrect - did you read the doc and it either lacked clarity or is incorrect itself? Or did you not read the doc and blindly tried something? If the former, we can fix the doc. If the latter, then perhaps a better "show_help" message in case of the error might steer someone to the right doc.
you gave correct input but the system did the wrong thing - clearly a bug that should be addressed.
you gave correct input and the system behaved as intended, but not as you thought it should. The question here is: did the doc mislead you? In that case, we clearly should fix the doc. Or is the doc clear, but you thought it should do something different? This gets a little grayer - we've spent years dealing with conflicting ideas on what something should do. We can always review things, but it can become rather time consuming.
you gave incorrect input and the system either did something it shouldn't have done or simply barfed. Probably need better input checking in that case - either way, it is a bug.
Probably more categories in there, but hopefully this provides some food for thought. What I'm ultimately looking for is a way to label these issues so we can more easily deal with them. If we can come up with a reasonably simple way of categorizing them, then we can create a label for each category and appropriately mark the issues - and it is okay to put multiple labels on the issue if it somewhat spans categories, though it would be preferred if we can contain that.
Dave and I have been testing based on the documentation we have - in the man pages, FAQs, and help output. Where it is ambiguous we have tried to test what we think it should do. Our goal is to test existing functionality, not necessarily invent new functionality or interpretations. We file tickets for any mismatch with the hope that we can dialog with the community on if it is a documentation issue, bug, invalid usage, or possibly a future feature.
I think in the case of this Issue we hit on a documentation clarification issue, and possibly a future feature. I added a note to fixe the documentation issue in this comment. I think we should re-classify this ticket as a future, possible enhancement request for a PR-ORDERED-LIST
option that preserves the order specified. I agree that would take a fair amount of work, and the priority of the request would need to be based on demand. Right now we don't have a demand for it from what I can tell.
I was about to make my own issue about this, but I have also experienced 'surprising' mapping with combinations of --map-by
--rank-by
--bind-to
options. In particular, I could not find a combination that replicates the good-old --bynode
option from prior Open MPI versions.
For example, the following option set yielded a crash, that looks like a bug
salloc -N 4 -Ccauchy bin/mpiexec --rank-by node --map-by :display --bind-to core ~/ompi/benchmarks/imb/mpi-benchmarks/IMB-MPI1.u5 -npmin 32 Barrier
salloc: Granted job allocation 391603
salloc: Waiting for resource configuration
salloc: Nodes c[00-03] are ready for job
======================== JOB MAP ========================
Data for JOB [3650,1] offset 0 Total slots allocated 32
Mapping policy: BYPACKAGE:NOOVERSUBSCRIBE Ranking policy: NODE Binding policy: CORE
Cpu set: N/A PPR: N/A Cpus-per-rank: N/A Cpu Type: CORE
Data for node: c00 Num slots: 8 Max slots: 0 Num procs: 8
Process jobid: [3650,1] App: 0 Process rank: 0 Bound: package[0][core:0]
Process jobid: [3650,1] App: 0 Process rank: 4 Bound: package[1][core:4]
Process jobid: [3650,1] App: 0 Process rank: 8 Bound: package[0][core:1]
Process jobid: [3650,1] App: 0 Process rank: 12 Bound: package[1][core:5]
Process jobid: [3650,1] App: 0 Process rank: 16 Bound: package[0][core:2]
Process jobid: [3650,1] App: 0 Process rank: 20 Bound: package[1][core:6]
Process jobid: [3650,1] App: 0 Process rank: 24 Bound: package[0][core:3]
Process jobid: [3650,1] App: 0 Process rank: 28 Bound: package[1][core:7]
Data for node: c01 Num slots: 8 Max slots: 0 Num procs: 8
Process jobid: [3650,1] App: 0 Process rank: 1 Bound: package[0][core:0]
Process jobid: [3650,1] App: 0 Process rank: 5 Bound: package[1][core:4]
Process jobid: [3650,1] App: 0 Process rank: 9 Bound: package[0][core:1]
Process jobid: [3650,1] App: 0 Process rank: 13 Bound: package[1][core:5]
Process jobid: [3650,1] App: 0 Process rank: 17 Bound: package[0][core:2]
Process jobid: [3650,1] App: 0 Process rank: 21 Bound: package[1][core:6]
Process jobid: [3650,1] App: 0 Process rank: 25 Bound: package[0][core:3]
Process jobid: [3650,1] App: 0 Process rank: 29 Bound: package[1][core:7]
Data for node: c02 Num slots: 8 Max slots: 0 Num procs: 8
Process jobid: [3650,1] App: 0 Process rank: 2 Bound: package[0][core:0]
Process jobid: [3650,1] App: 0 Process rank: 6 Bound: package[1][core:4]
Process jobid: [3650,1] App: 0 Process rank: 10 Bound: package[0][core:1]
Process jobid: [3650,1] App: 0 Process rank: 14 Bound: package[1][core:5]
Process jobid: [3650,1] App: 0 Process rank: 18 Bound: package[0][core:2]
Process jobid: [3650,1] App: 0 Process rank: 22 Bound: package[1][core:6]
Process jobid: [3650,1] App: 0 Process rank: 26 Bound: package[0][core:3]
Process jobid: [3650,1] App: 0 Process rank: 30 Bound: package[1][core:7]
Data for node: c03 Num slots: 8 Max slots: 0 Num procs: 8
Process jobid: [3650,1] App: 0 Process rank: 3 Bound: package[0][core:0]
Process jobid: [3650,1] App: 0 Process rank: 7 Bound: package[1][core:4]
Process jobid: [3650,1] App: 0 Process rank: 11 Bound: package[0][core:1]
Process jobid: [3650,1] App: 0 Process rank: 15 Bound: package[1][core:5]
Process jobid: [3650,1] App: 0 Process rank: 19 Bound: package[0][core:2]
Process jobid: [3650,1] App: 0 Process rank: 23 Bound: package[1][core:6]
Process jobid: [3650,1] App: 0 Process rank: 27 Bound: package[0][core:3]
Process jobid: [3650,1] App: 0 Process rank: 31 Bound: package[1][core:7]
=============================================================
[saturn:32036] PRTE ERROR: Fatal in file ../../../../../../master/3rd-party/prrte/src/mca/rmaps/base/rmaps_base_ranking.c at line 690
[saturn:32036] PRTE ERROR: Fatal in file ../../../../../../master/3rd-party/prrte/src/mca/odls/base/odls_base_default_fns.c at line 276
[saturn:32036] PRTE ERROR: Fatal in file ../../../../../../master/3rd-party/prrte/src/mca/plm/base/plm_base_launch_support.c at line 512
[saturn:32036] FORCE TERMINATE ORDERED AT ../../../../../../master/3rd-party/prrte/src/mca/plm/base/plm_base_launch_support.c:513 - error Unknown error(1)
But other combinations would do unexpected things silently (like ranking by package when requesting rank-by node, and other bizarre things).
I could not find a combination that replicates the good-old --bynode option from prior Open MPI versions.
I'm going to look at the crash, but the equivalent of bynode
is simply --map-by node
This issue is an enhancement request to add a PR-ORDERED-LIST
option to --map-by
similar to the PR-LIST
option, but honors the ordering of the options. Note below from earlier comment:
I think we should re-classify this ticket as a future, possible enhancement request for a PR-ORDERED-LIST option that preserves the order specified. I agree that would take a fair amount of work, and the priority of the request would need to be based on demand. Right now we don't have a demand for it from what I can tell.
Thank you for taking the time to submit an issue!
Background information
What version of the PMIx Reference Server are you using? (e.g., v1.0, v2.1, git master @ hash, etc.)
Master branch
What version of PMIx are you using? (e.g., v1.2.5, v2.0.3, v2.1.0, git branch name and hash, etc.)
Master branch
Please describe the system on which you are running
Operating system/version: RHEL 7.7
Computer hardware: 4 Power 8 nodes, 2 sockets each with 10 cores (20 total) and 8 hwthreads per core (160 total)
Network type: Hostfile specifies 4 nodes, each with slots=8 keyword
Details of the problem
If I run the command prterun -n 4 --hostfile hostfile8 --map-by core:PE-LIST=0,1,2,3 --bind-to core:report echo then I get output I expect
If I run the command prterun -n 4 --hostfile hostfile8 --map-by core:PE-LIST=3,2,1,0 --bind-to core:report echo I get the same output, where I expect something like
I think the ability to specify an arbitrary mapping is useful, such as when trying to place tasks for memory or cache affinity.
The problem is that when the set of resources specified in the PE-LIST option is handled as a set instead of a list, which means that it is impossible to retain any concept of ordering in the PE-LIST resources.