Open jsquyres opened 7 years ago
when you disable-dlopen, lots of symbols are suddenly going to be in our library as everything gets swept up into it. Most of these symbols are coming from libraries you linked against, not from OMPI itself.
Yes, but even if we disable dlopen, all those symbols should be private -- and they are with gcc
.
Also, I don't know why the test would be showing symbols from dependent libraries (like libm
).
@markalle Any ideas?
Well, about the --disable-dlopen I did fix up the nm test so when all those extra MCA symbols get pulled into the main library those are all merely identified as warnings. I think it would be nice to have those names cleaned up too, since that would still be name pollution for people who build --disable-dlopen. But since the main path is to have MCAs as .so files I don't care very much about all the warning symbols.
For the errors shown here I'd agree the nm test should ignore those. You must be right that "icc" is creating those, and in general anything the compiler piles into the executable should be considered legal. The purpose of the nm test is to make sure OMPI isn't doing name pollution that could conflict with a user's application, not to nitpick compiler internals.
I don't know how to be automate that better, but for now I think this just means I should grow the list a little bit of symbols that are considered legal. Want me to make a PR for that?
I think you're fighting an unwinnable battle. Every compiler and configure option is going to result in more symbols that have to be white-listed, and you'll have to update it with the release of every new version.
A better approach might be to search for symbols that contain things used in OMPI - e.g., "btl" - but that also has problems.
Bottom line is: while you have a noble goal, it is ultimately unachievable in an automated fashion. At the very least, please stop having this test cause a failure!
I don't think it'll be that hard to automate. If I look at .o instead of .so wouldn't that avoid extra compiler-generated symbols? In that case I'd still start by looking at the symbols in a .so, but also have a list that came from the .o files and filter the former list to only include symbols that also exist in the latter list.
If I only look at symbols with particular pre-selected names the test would miss its whole purpose, which is to catch accidental additions of symbols like create_list() that are guaranteed to conflict with applications.
I agree it's a nuisance when the test produces false failures though. If I changed it to never fail, how hard is it to pull up the output from everybody's MTT test to browse? What I'd do is disable the test until it has a longer history of not producing false failures.
I don't know how you make this work, but you are welcome to beat your head against it 😄
I think this test should not be part of "make check", but instead be in MTT and you can run it on your machines (and make it fail) - those of us who don't care are then not impacted by it.
Symbol name pollution isn't just a cosmetic issue though, it's a real bug when an application and the MPI library use the same global symbol. To the extent the test works its errors shouldn't be ignored. When it gives nuisance false positives though, that I totally agree nobody should be forced to deal with.
So I've made a PR to always exit(0) on the belief that it has too many false positives: https://github.com/open-mpi/ompi/pull/4119
This PR also does
I think you are missing the intended purpose of "make check" in our repository. Those tests are intended to be a quick method by which someone installing the code can ensure that something isn't broken. The directory isn't intended to be used by developers as a place to test whether internal code is broken - that's why we have all those MTT tests.
This test has nothing to do with whether or not the resulting build is broken. While I agree it isn't a cosmetic issue, it is also true that we don't get complaints about symbol issues on our mailing lists, so this isn't a red hot issue out there.
The test therefore really belongs in the ompi-tests repo so we can run it nightly and see the results in the MTT summary. Once moved there, you can use a controlled configuration to create a "fingerprint" of the symbol report, store that in the repo, and then simply compare the result of each execution (using the exact same configuration) to the fingerprint. It becomes really trivial to detect changes, and then someone can look to see where they came from.
You cannot do that in "make check" because you have no control over the user's environment. So you've made your life 100x harder than it needs to be, and that's why you get all these false positives. The current method will never result in zero false positives because everyone's compiler and configuration differs, and that impacts the symbol results.
😆 didn't realize I was still signed in on the userid we were setting up for testing.
Thanks, that makes sense. I don't object to MTT. I've updated the pull request to disable it
Keep in mind that you can also drop all that "whitelist" stuff as all you care about is difference from the "gold" fingerprint. Given you now control the configuration/compiler combination, the only change should be coming from something we do.
Per Webex discussion 22 Aug 2017: we should remove this test from the critical path (i.e., remove it from make check
altogether), fix it, and then put it back. I.e., everyone agrees that the intent of this test is good. But it's causing a bunch of false failures. So it needs to be taken out, evaluated / hardened up (if possible), and (if possible) returned to the critical path.
Are we not specifying some flag to
icc
properly?When compiling with
icc
, 17.0 the symbol name checker test (inmake check
) finds a ginormous number of leaked symbols. The report is too long to include inline -- see attached:make-check-symbol-checker-icc-fail.txt
This causes MTT fails with
make check
, such as https://mtt.open-mpi.org/index.php?do_redir=2468.