Open bwbarrett opened 4 years ago
@bwbarrett will port the simple fix from the OpenPMIX PR over here to Open MPI.
Discussing more with Jeff, we're not going to do anything for v4.1.x, but should do something for master. The primary change in that patch set is to print the filenames used in searching for components. Today, we print something like:
mca_base_component_repository_open: unable to open foo: file not found (ignored)
Note that the full filename is not printed; only the base part of the filename (before the suffix) is printed, because we search for multiple suffixes to deal with machine portability issues (like file extensions of .so on linux vs. .dll on windows). This is particularly confusing in the situation where debuggers (using Ralph's example) add a suffix and we're looking for the wrong filename, and the printed name might be foo.so even though the file we're actually looking for is foo.so.dbg or something similar.
@jsquyres and I initially thought that the right solution was to list all files that we tried to open and failed. This would work, but seems like a better solution would be to thread the full path through process_repository_item()
so that mca_base_component_repository_open()
has the full pathname that was originally found and uses dlopen instead of dlopenext. There's really no reason in our current model to throw away information we already have.
@jsquyres @bwbarrett Anybody do this?
Suggest reproducing the issue. Last comm was in 2021
I think https://github.com/openpmix/openpmix/pull/1772 applies to Open MPI as well. Need to investigate a bit deeper to understand.