open-mpi / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
2.15k stars 859 forks source link

Investigate file munging issue found in openpmix #7782

Open bwbarrett opened 4 years ago

bwbarrett commented 4 years ago

I think https://github.com/openpmix/openpmix/pull/1772 applies to Open MPI as well. Need to investigate a bit deeper to understand.

jsquyres commented 4 years ago

@bwbarrett will port the simple fix from the OpenPMIX PR over here to Open MPI.

bwbarrett commented 4 years ago

Discussing more with Jeff, we're not going to do anything for v4.1.x, but should do something for master. The primary change in that patch set is to print the filenames used in searching for components. Today, we print something like:

mca_base_component_repository_open: unable to open foo: file not found (ignored)

Note that the full filename is not printed; only the base part of the filename (before the suffix) is printed, because we search for multiple suffixes to deal with machine portability issues (like file extensions of .so on linux vs. .dll on windows). This is particularly confusing in the situation where debuggers (using Ralph's example) add a suffix and we're looking for the wrong filename, and the printed name might be foo.so even though the file we're actually looking for is foo.so.dbg or something similar.

@jsquyres and I initially thought that the right solution was to list all files that we tried to open and failed. This would work, but seems like a better solution would be to thread the full path through process_repository_item() so that mca_base_component_repository_open() has the full pathname that was originally found and uses dlopen instead of dlopenext. There's really no reason in our current model to throw away information we already have.

rhc54 commented 3 years ago

@jsquyres @bwbarrett Anybody do this?

wenduwan commented 5 months ago

Suggest reproducing the issue. Last comm was in 2021