Open culler opened 2 weeks ago
Documentation preview for this PR (built with commit 3f1894eb06d5111c4bfb2b468a701db73d68b784; changes) is ready! :tada: This preview will update shortly after each push to this PR.
NOTE: the underscore after @ above should be ignored. It was added to avoid a link to a github user page
Note you could also just type `@rpath`
I'm skeptical that only primecountpy
would need the rpath on macOS, but I haven't read the explanation in the PR description very carefully yet.
You are wise to be skeptical, but that was the only one that turned up when I ran the tests. Perhaps there are optional spkgs that need it. I could search for those, since the macOS app includes all optional spkgs that I have been able to build, including some that I needed to patch. But that would take some work.
As explained in the PR, normally an absolute path is used in the LC_LOAD_DYLIB command. I think primecountpy is going out of its way to use an rpath instead.
Any package which has been run through delocate will use @_rpath. But I don't think that those packages would be influenced by sage-env.
I don't think there's any special code in primecountpy - https://github.com/dimpase/primecountpy/blob/master/setup.py If there's anything special, it would have to come from primecount, which is built using cmake.
(Also note we don't use delocate in our build process. It only makes an appearance in the tests for meson-python, and in make list-broken-packages
.)
Yes, you are right. I found another one: libsymengine.
Would it be OK with you if I cherry-pick the python3 upgrade (+ patch removal) onto a new PR?
That was easier than I thought. It looks like there are 4 others: (These were found by searching through the 10.3 build that I used for the macOS app -- I will redo it for 10.4 and see what turns up.)
grep '@rpath' `find . -name '*.so'`
Binary file ./site-packages/symengine/lib/symengine_wrapper.cpython-311-darwin.so matches
Binary file ./site-packages/rpds/rpds.cpython-311-darwin.so matches
Binary file ./site-packages/primecountpy/primecount.cpython-311-darwin.so matches
Binary file ./site-packages/sage/libs/ecl.cpython-311-darwin.so matches
Binary file ./site-packages/sage/graphs/bliss.cpython-311-darwin.so matches
I'm surprised that I didn't see test failures for those, but who knows?
By the way, ecl.cpython-311-darwin.so provides a good example of how sage has been doing things in the past:
LC_LOAD_DYLIB: @rpath/libecl.23.9.dylib LC_LOAD_DYLIB: /private/var/tmp/sage-10.3-current/local/lib/libgmp.10.dylib LC_LOAD_DYLIB: /usr/lib/libSystem.B.dylib LC_LOAD_DYLIB: /private/var/tmp/sage-10.3-current/local/lib/libgc.1.dylib LC_LOAD_DYLIB: /private/var/tmp/sage-10.3-current/local/lib/libpari-gmp-tls.dylib LC_RPATH: /private/var/tmp/sage-10.3-current/local/lib LC_RPATH: /private/var/tmp/sage-10.3-current/local/lib LC_RPATH: /private/var/tmp/sage-10.3-current/local/var/lib/sage/venv-python3.11.8/lib LC_RPATH: . LC_RPATH: /private/var/tmp/sage-10.3-current/local/lib LC_RPATH: /private/var/tmp/sage-10.3-current/local/lib LC_RPATH: /private/var/tmp/sage-10.3-current/local/lib LC_RPATH: /private/var/tmp/sage-10.3-current/local/lib LC_RPATH: /private/var/tmp/sage-10.3-current/local/lib
It has one ridiculous rpath which is '.' (meaningless to the loader) and 7 copies of the one rpath it needs (remember duplicate rpaths are now deprecated by Apple) plus one other irrelevant rpath.
In my build of 10.4.beta4 based on this PR (with no optional packages) the only python extension modules which legitimately use @rpath are ecl.cpython-311-darwin.so primecount.cpython-311-darwin.so. But I should also fix symengine_py by adding the appropriate rpath options to its LDFLAGS as I did with primecountpy.
The rpds package was a red herring. It only uses @rpath in its LC_ID_DYLIB load command, which is not used by the loader and makes no difference anyway. Also, rpds is only installed as a dependency of jupyter and hence does not need Sage's LDFLAGS.
Would it be OK with you if I cherry-pick the python3 upgrade (+ patch removal) onto a new PR?
Sure. Have at it.
I found out why the ecl python extension did not cause tests to fail even though it has a LC_LOAD_DYLIB load command containing @rpath
. The reason is that it has the correct LC_RPATH. In fact, it has two of them.:
LC_LOAD_DYLIB: @rpath/libecl.23.9.dylib
LC_LOAD_DYLIB: /Users/culler/sage_pr/sage/local/lib/libgc.1.dylib
LC_LOAD_DYLIB: /usr/lib/libSystem.B.dylib
LC_LOAD_DYLIB: /Users/culler/sage_pr/sage/local/lib/libgmp.10.dylib
LC_LOAD_DYLIB: /Users/culler/sage_pr/sage/local/lib/libpari-gmp-tls.dylib
LC_RPATH: /Users/culler/sage_pr/sage/local/lib
LC_RPATH: /Users/culler/sage_pr/sage/local/lib
That might help explain why it had the record-setting 7 identical rpaths before this PR. I don't know where those two come from, but I guess it is not a problem for this PR. It may become a problem when duplicate rpaths become errors, instead of producing deprecation warnings.
So that leave just the two optional packages symengine_py and bliss that will need to have rpaths set as wias already done with primecountpy.
I have fixed the bliss and symengine packages by adding rpath commands to LDFLAGS. Both packages load without errors.
I think this PR is now ready to go.
Are you saying that primecountpy needed a fix of some sort? IMHO it works without this addition.
No I am not saying that. I am saying that as part of this PR I had to add a line of code in the spkg-install.in script which adds an rpath option to the LDFLAGS.
The primecountpy spkg was one of three that actually needs that option. Prior to this PR this option was added to LDFLAGS every time that sage-env was run leading to many duplicate rpath load commands which were almost never needed.
PS I know it is a pain to read a long PR description but I thought I should record what was going on since clearly there were some serious misconceptions.
On Mon, Apr 29, 2024, 7:57 AM Dima Pasechnik @.***> wrote:
Are you saying that primecountpy needed a fix of some sort? IMHO it works without this addition.
— Reply to this email directly, view it on GitHub https://github.com/sagemath/sage/pull/37886#issuecomment-2082656661, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ6CP2IDBWDRW3DXQ2VI7LY7Y7RXAVCNFSM6AAAAABG4MHXLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBSGY2TMNRWGE . You are receiving this because you authored the thread.Message ID: @.***>
this is strange. primecountpy is a straight PyPI package. No woodoo is needed, just pip-install it.
I am under impression that anything path-specific for primecountpy is done at its build-time, and is all baked in there.
It is not strange. When you install a pypi package from a source distribution, as sage does, and if the package requires an external library, as primecountpy does, then you have to inform the linker about where the external library is located. In rare cases (rare on macOS anyway) you may also need to specify an rpath.
Typically this is done by adding linker options to LDFLAGS. That is how sage does it. Currently sage adds an rpath option in sage-env. That causes many unused and duplicated rpath load commands. This PR changes that, for macOS only, by not adding the rpath in sage-env. Instead it adds the rpath option in the spkg-install.in script. This is only needed for one standard and two optional spkgs. The primecountpy package is the one standard spkg that needs an rpath. Nothing is changed in the package itself and the LDFLAGS used when building it is the same as it currently is. All other packages besides those three are built with a shorter LDFLAGS that omits the unused rpath option.
On Mon, Apr 29, 2024, 9:10 AM Dima Pasechnik @.***> wrote:
this is strange. primecountpy is a straight PyPI package. No woodoo is needed, just pip-install it.
— Reply to this email directly, view it on GitHub https://github.com/sagemath/sage/pull/37886#issuecomment-2082860758, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ6CP33VBUCFA3IAGPOKADY7ZIE3AVCNFSM6AAAAABG4MHXLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBSHA3DANZVHA . You are receiving this because you authored the thread.Message ID: @.***>
If you follow the instructions in the old makefile you get an error. (Try it.) With the new instructions it works.
On Thu, May 2, 2024, 4:15 PM John H. Palmieri @.***> wrote:
@.**** commented on this pull request.
In build/make/Makefile.in https://github.com/sagemath/sage/pull/37886#discussion_r1588403514:
@@ -9,7 +9,7 @@
obscure the substance of the actual rules, this file can be debugged by
running:
# -# $ make -f build/make/Makefile -n DEBUG_RULES=1 +# $ make -f build/make/Makefile -n DEBUG_RULES=1 SAGE_PKGCONFIG=1
Can you explain this change?
— Reply to this email directly, view it on GitHub https://github.com/sagemath/sage/pull/37886#pullrequestreview-2036937233, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ6CP67VZXXD2GE6BGTCULZAKUI3AVCNFSM6AAAAABG4MHXLOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDAMZWHEZTOMRTGM . You are receiving this because you authored the thread.Message ID: @.***>
I should have said that the error you see when running $ make -f build/make/Makefile -n DEBUG_RULES=1
is
generated by this code which begins on line 20:
# Check a variable that is only set in build/make/install, but not in sage-env, for example
ifndef SAGE_PKGCONFIG
# Set by build/bin/sage-sdist, which invokes the Makefile directly in
# order to download upstream packages for distribution.
ifndef SAGE_SPKG_COPY_UPSTREAM
$(error This Makefile needs to be invoked by build/make/install)
endif
endif
So you can see why defining SAGE_PKGCONFIG prevents the error.
I should have said that the error you see when running
$ make -f build/make/Makefile -n DEBUG_RULES=1
is generated by this code which begins on line 20:# Check a variable that is only set in build/make/install, but not in sage-env, for example ifndef SAGE_PKGCONFIG # Set by build/bin/sage-sdist, which invokes the Makefile directly in # order to download upstream packages for distribution. ifndef SAGE_SPKG_COPY_UPSTREAM $(error This Makefile needs to be invoked by build/make/install) endif endif
So you can see why defining SAGE_PKGCONFIG prevents the error.
Got it, thank you.
I pushed a new change to Makefile.in which fixes the debug command without requiring the user to set the unrelated variable SAGE_PKGCONFIG.
The changes to the python spkg in this PR have already been merged into 10.4, using Matthias' cherry-picked PR. There are really very few changes left and removing all of the macOS rpath garbage is a major improvment. I would really appreciate some help in getting the rest of this PR merged. I think a positive review from someone might do the job. I don't think there are any unresolved issues remaining.
Looks great to me. Setting positive review.
@culler Rebased as requested.
@culler Rebased as requested.
Thanks!!!
Very similar linker errors with fflas_ffpack have been independently reported for 10.4.beta6 on sage-release, confirming that those failures are unrelated to this PR.
Since I have done the needed work, and tested on both linux and macOS, I am going to remove the "needs work" label and restore the "positive review" label so this change can eventually make it into 10.4. If I am not allowed to do that, feel free to undo my changes to the labels.
This PR does the following (after splitting out the Python upgrade to #37914):
About rpaths on macOS Sage has been misusing rpaths on macOS for a very long time. As far as I can tell, this was an error based on the incorrect assumption that MACH binaries use rpath in the same way as ELF binaries.
The rpath in an ELF binary provides a search path, beyond the default search path, where the loader looks for shared libraries needed by the binary.
A MACH binary specifies the dynamic libraries it needs with load commands of type LC_LOAD_DYLIB. The value of a LC_LOAD_DYLIB load command is a path to a specific library. There is one such load command for each library that the binary needs. The path is an absolute path by default. However, the path is allowed to begin with the substring
@rpath
. When the binary is loaded that substring is replaced by the value of each LC_RPATH load command in the binary to produce a list of absolute paths (or relative paths if the value of the LC_RPATH begins with@loader_path
). Each of the paths on that list is tried as a possible location of the needed dynamic library.An obvious consequence of this design is that if none of the LC_LOAD_DYLIB paths contains the string
@rpath
then there is no need for any LC_RPATH commands, since they won't be used. With one exception (primecountpy) every MACH binary produced when building Sage had absolute paths for its LC_LOAD_DYLIB load commands. Nonetheless Sage was setting the rpath to $SAGE_LOCAL/lib in all of these binaries. In fact, it was doing that multiple times, producing many duplicate rpaths all of which were useless. (And duplicate rpaths are now deprecated by Apple's linker.)This PR removes all of those bogus rpaths.
About xcode-select The sage-env script was calling gcc to find the names of its linker and archiver in the unlikely case that either of those had a non-standard name, and it was also calling xcode-select as part of a check to see whether the selected XCode toolchain had an ld-classic executable. That executable was added in XCode 15 as a new name for the old linker when a new linker was added. Apple made it possible to use the old linker by passing the linker option -ld_classic to the gcc command. The new linker was buggy in the beginning and was unable to link openblas. So numpy and scipy use the old linker. Sage also wanted to use it. While the new linker is probably working well enough now as a linker, it generates a warning when it encounters duplicate libraries in a link command. It also appears that the new linker, or XCode 15 itself, somehow creates duplicate libraries. While the warnings should be innocuous, they were causing Sage doctest failures. So this PR provides code which adds -Wl,-ld_classic to LDFLAGS in sage-env to replace the analogous code which was already there. But the new code is careful to redirect the error messages to /dev/mull so they do not get printed on the users terminal whenever sage starts up on a machine which does not have any XCode toolchain at all. It also avoids calling gcc in that case since doing so posts a dialog on the user's screen asking if they would like to install command line tools. The dialog is a nice touch, but should only appear if the user is actually using gcc, not as part of every invocation of Sage, especially since XCode uses the standard names.
:memo: Checklist
:hourglass: Dependencies