Open burlen opened 6 months ago
Can't remove in v5.0.x but can add a message, and those that are not used, to be removed in main There is a PR coming in where a parameter or two is used.
https://github.com/open-mpi/ompi/pull/12453/files Will use the fanout parameter(there are 2 fanouts).
@wenduwan would it make sense to use a different mca parameter name to avoid confusion with the original use of the mca fan out parameters? while these features are not very well documented, I assume it is documented somewhere and could thus be confusing with your new use. Assuming the two implementations are different
@burlen I'm actually fuzzy on the difference between fanout vs radix. Do you mind providing clarity on the terminology?
@burlen I'm actually fuzzy on the difference between fanout vs radix. Do you mind providing clarity on the terminology?
@wenduwan The Bruck algorithm completes in logarithmic number of rounds. Radix refers to the base of the logarithm. This is the terminology the author of the PR you linked uses for function argument names, and is used in the paper I linked in the review. It is really not about radix vs fanout. The problem is fanout is already in use for a different purpose, which makes overloading it as done in the PR potentially confusing.
@burlen I see.
From an engineering perspective, we also need to consider code maintainability. This issue arises from the introduction of additional parameters, which I believe had good reasons at some point - similar to this discussion.
For fan-out vs radix, do you see a case where both can be used together? Is it sufficient to simply document what the parameter means in different algorithms rather than creating an entirely new parameter?
For fan-out vs radix, do you see a case where both can be used together?
Since faninout is not even currently used in any of the alltoall implementations, I don't see how it could be
Is it sufficient to simply document what the parameter means in different algorithms rather than creating an entirely new parameter?
That would certainly be helpful. In my opinion it would be better to have the parameter name directly relate to the algorithm. for both maintainability and usability. the name itself can be part of the documentation (helps both users and maintainers). Such naming could even reduce the burden of writing documentation because you could cite the paper for more information
It's kind of a minor thing, I guess I'm a bit hung up on it (apologies!), because I'm finding these tuning features to be somewhat poorly documented and as a result rather difficult to use.
@burlen Thanks for the followup. In this case I suggest we postpone the decision to introduce a new parameter, and enrich the documentation for now. It is easier to add features than deprecate them afterwards.
Background information
I'm tuning OpenMPI's alltoall on our cluster.
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
v5.0.3
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Compiled from sources, on tag v5.0.3
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.33e93469e1e1f69904ff3e3827394719aa6b3671 3rd-party/openpmix (v5.0.2) 3a70fac9a21700b31c4a9f9958afa207a627f0fa 3rd-party/prrte (v3.0.5) dfff67569fb72dbf8d73a1dcf74d091dad93f71b config/oac (heads/main)
Please describe the system on which you are running
Details of the problem
The following mca parameters relating to alltoall in coll tuned component are reported by
ompi_info
but are not used in the code.These may be remnants of a previous implementation, but currently setting these does nothing. Their presence complicates tuning. They should either be removed or at least the description string reported by
ompi_info
modified so that it's clear that these do nothing.