Open opoplawski opened 9 years ago
Okay, this turned out to be caused by having an already installed openmpi on the system. Apparently the dlopen() code is picking that up in preference to the build libraries. This seems like a bug as well, but not sure if this is new or not.
That is a long outstanding bug in Open MPI. We are discussing ways to correct this for 1.9.
Okay, I'll leave to this to close/reassign/etc. as you see fit then.
Thanks Howard.
To begin the discussion we could probably use the following scheme for plugin paths:
<prefix>/lib[64]/openmpi/<project>/<PROJECT_VERSION>/<framework>/<ABI_VERSION>/
This is the most flexible naming scheme. So for example a 1.9/2.0 btl 3.0 plugin might be found in:
<prefix>/lib/openmpi/opal/2.0/btl/3.0
I mostly agree, but will be slightly more pedantic:
<libdir>/openmpi/<project>/<PROJECT_VERSION>/<framework>/<ABI_VERSION>/
That being said, this is probably a little overkill -- do we need both PROJECT_VERSION and ABI_VERSION? I.e., won't those 2 be chained together?
Actually, I wasn't pedantic enough. :-)
This is more correct:
<pkglibdir>/<project>/<PROJECT_VERSION>/<framework>/<ABI_VERSION>/
Our modules have a version number. Why simply discarding right after dlopen all modules with the wrong version number is not a adequate solution?
Hmm, that may be the way to go. We now have the mca version, project version, and type version in the mca component.
To make this work well I should probably version the frameworks themselves. Will investigate.
George do you mean the shared library version number?
Couldn't we use some versioned symbols similar to the way libfabric does it, and then check for the presence of a particular versioned symbol in a *.so using dlvsym? Are there any other projects that need all this type of subdirectory structure for shared libraries they use internally? It seems a little weird.
Is the goal to be able to install multiple versions of open mpi in the same location, or just make sure that an incompatible *.so in openmpi dir is not dlopen'd with subsequent badness as reported above?
Howard, the primary goal is to no open incompatible .so's but it would be a useful feature to be able to have multiple versions of a project (opal for example) installed in the same tree.
Don't forget that there are other reasons why we can't install two versions of OMPI into the same tree, such as:
The goal is to prevent a scenario like this:
That being said, perhaps just checking the version numbers in the .so is good enough -- perhaps a new directory structure is not worth it (since, even if you do that, you can't install multiple versions of OMPI into the same tree).
@hppritcha I think the symbol versioning stuff is a slightly different use case than what we're trying to protect against here...?
Just to be explicit - the problem I was running into was having openmpi 1.8.2 installed in /usr, then building newer versions in my home directories and running in-tree tests.
I thought the _.so's in openmpi directory lack version numbers, hence the --avoid-version in the laLDFLAGS in all the mca//Makefile.am's. I guess we'd have to pay attention to VERSION then and not just fill in 0.0.0? I'd be fine with that. As long as we kept true to the current/rev/age formula and have C-A really mean something, this would take care of the problem, including Opoplawski's problem.
It would get complicated if the version numbers for the different *.so's could vary.
Yes. The .so files have no version number in the filename. What George is referring to is the mca_base_component_t inside the .so. That contains version information for the plugin.
The only problem with using that structure is we may change it from release to release. We just did this by adding the project name and version.
sounds like a problem of introducing standard shared library versioning. just say no to -avoid-version and really use so versioning. On Mar 18, 2015 8:47 PM, "Nathan Hjelm" notifications@github.com wrote:
Yes. The .so files have no version number in the filename. What George is referring to is the mca_base_component_t inside the .so. That contains version information for the plugin.
The only problem with using that structure is we may change it from release to release. We just did this by adding the project name and version.
— Reply to this email directly or view it on GitHub https://github.com/open-mpi/ompi/issues/475#issuecomment-83301710.
@hjelmn is #449 good enough to close this bug.
We talked about this in person at the dev meeting in June 2015. We concluded:
@hjelmn says that he will get to this some time in the v2.x series.
Moving to 3.x
I'm trying to build openmpi master from openmpi-dev-1330-g7640507 as the Fedora package for testing. I'm getting: