sagemath / sage

Main repository of SageMath. Now open for Issues and Pull Requests.
https://www.sagemath.org
Other
1.08k stars 394 forks source link

Remove pointless rpaths on macOS; make sage-env polite when run on systems with no toolchain #37886

Open culler opened 2 weeks ago

culler commented 2 weeks ago

This PR does the following (after splitting out the Python upgrade to #37914):

  1. Fix a comment in the Makefile about how to use the DEBUG flag so it will actually work.
  2. Edit sage-env so that it does not print errors to stderr and/or open a dialog whenever it is run on a system which has neither XCode nor command line tools installed.
  3. Remove (almost) all uses of rpath on macOS (see below).
  4. Fix the primecountpy spkg, which was the only one which actually needed an rpath in its python extension module.

About rpaths on macOS Sage has been misusing rpaths on macOS for a very long time. As far as I can tell, this was an error based on the incorrect assumption that MACH binaries use rpath in the same way as ELF binaries.

The rpath in an ELF binary provides a search path, beyond the default search path, where the loader looks for shared libraries needed by the binary.

A MACH binary specifies the dynamic libraries it needs with load commands of type LC_LOAD_DYLIB. The value of a LC_LOAD_DYLIB load command is a path to a specific library. There is one such load command for each library that the binary needs. The path is an absolute path by default. However, the path is allowed to begin with the substring @rpath. When the binary is loaded that substring is replaced by the value of each LC_RPATH load command in the binary to produce a list of absolute paths (or relative paths if the value of the LC_RPATH begins with @loader_path). Each of the paths on that list is tried as a possible location of the needed dynamic library.

An obvious consequence of this design is that if none of the LC_LOAD_DYLIB paths contains the string @rpath then there is no need for any LC_RPATH commands, since they won't be used. With one exception (primecountpy) every MACH binary produced when building Sage had absolute paths for its LC_LOAD_DYLIB load commands. Nonetheless Sage was setting the rpath to $SAGE_LOCAL/lib in all of these binaries. In fact, it was doing that multiple times, producing many duplicate rpaths all of which were useless. (And duplicate rpaths are now deprecated by Apple's linker.)

This PR removes all of those bogus rpaths.

About xcode-select The sage-env script was calling gcc to find the names of its linker and archiver in the unlikely case that either of those had a non-standard name, and it was also calling xcode-select as part of a check to see whether the selected XCode toolchain had an ld-classic executable. That executable was added in XCode 15 as a new name for the old linker when a new linker was added. Apple made it possible to use the old linker by passing the linker option -ld_classic to the gcc command. The new linker was buggy in the beginning and was unable to link openblas. So numpy and scipy use the old linker. Sage also wanted to use it. While the new linker is probably working well enough now as a linker, it generates a warning when it encounters duplicate libraries in a link command. It also appears that the new linker, or XCode 15 itself, somehow creates duplicate libraries. While the warnings should be innocuous, they were causing Sage doctest failures. So this PR provides code which adds -Wl,-ld_classic to LDFLAGS in sage-env to replace the analogous code which was already there. But the new code is careful to redirect the error messages to /dev/mull so they do not get printed on the users terminal whenever sage starts up on a machine which does not have any XCode toolchain at all. It also avoids calling gcc in that case since doing so posts a dialog on the user's screen asking if they would like to install command line tools. The dialog is a nice touch, but should only appear if the user is actually using gcc, not as part of every invocation of Sage, especially since XCode uses the standard names.

:memo: Checklist

:hourglass: Dependencies

github-actions[bot] commented 2 weeks ago

Documentation preview for this PR (built with commit 3f1894eb06d5111c4bfb2b468a701db73d68b784; changes) is ready! :tada: This preview will update shortly after each push to this PR.

mkoeppe commented 2 weeks ago

NOTE: the underscore after @ above should be ignored. It was added to avoid a link to a github user page

Note you could also just type `@rpath`

mkoeppe commented 2 weeks ago

I'm skeptical that only primecountpy would need the rpath on macOS, but I haven't read the explanation in the PR description very carefully yet.

culler commented 2 weeks ago

You are wise to be skeptical, but that was the only one that turned up when I ran the tests. Perhaps there are optional spkgs that need it. I could search for those, since the macOS app includes all optional spkgs that I have been able to build, including some that I needed to patch. But that would take some work.

As explained in the PR, normally an absolute path is used in the LC_LOAD_DYLIB command. I think primecountpy is going out of its way to use an rpath instead.

culler commented 2 weeks ago

Any package which has been run through delocate will use @_rpath. But I don't think that those packages would be influenced by sage-env.

mkoeppe commented 2 weeks ago

I don't think there's any special code in primecountpy - https://github.com/dimpase/primecountpy/blob/master/setup.py If there's anything special, it would have to come from primecount, which is built using cmake.

(Also note we don't use delocate in our build process. It only makes an appearance in the tests for meson-python, and in make list-broken-packages.)

culler commented 2 weeks ago

Yes, you are right. I found another one: libsymengine.

mkoeppe commented 2 weeks ago

Would it be OK with you if I cherry-pick the python3 upgrade (+ patch removal) onto a new PR?

culler commented 2 weeks ago

That was easier than I thought. It looks like there are 4 others: (These were found by searching through the 10.3 build that I used for the macOS app -- I will redo it for 10.4 and see what turns up.)

grep '@rpath' `find . -name '*.so'`
Binary file ./site-packages/symengine/lib/symengine_wrapper.cpython-311-darwin.so matches
Binary file ./site-packages/rpds/rpds.cpython-311-darwin.so matches
Binary file ./site-packages/primecountpy/primecount.cpython-311-darwin.so matches
Binary file ./site-packages/sage/libs/ecl.cpython-311-darwin.so matches
Binary file ./site-packages/sage/graphs/bliss.cpython-311-darwin.so matches

I'm surprised that I didn't see test failures for those, but who knows?

By the way, ecl.cpython-311-darwin.so provides a good example of how sage has been doing things in the past:

LC_LOAD_DYLIB: @rpath/libecl.23.9.dylib LC_LOAD_DYLIB: /private/var/tmp/sage-10.3-current/local/lib/libgmp.10.dylib LC_LOAD_DYLIB: /usr/lib/libSystem.B.dylib LC_LOAD_DYLIB: /private/var/tmp/sage-10.3-current/local/lib/libgc.1.dylib LC_LOAD_DYLIB: /private/var/tmp/sage-10.3-current/local/lib/libpari-gmp-tls.dylib LC_RPATH: /private/var/tmp/sage-10.3-current/local/lib LC_RPATH: /private/var/tmp/sage-10.3-current/local/lib LC_RPATH: /private/var/tmp/sage-10.3-current/local/var/lib/sage/venv-python3.11.8/lib LC_RPATH: . LC_RPATH: /private/var/tmp/sage-10.3-current/local/lib LC_RPATH: /private/var/tmp/sage-10.3-current/local/lib LC_RPATH: /private/var/tmp/sage-10.3-current/local/lib LC_RPATH: /private/var/tmp/sage-10.3-current/local/lib LC_RPATH: /private/var/tmp/sage-10.3-current/local/lib

It has one ridiculous rpath which is '.' (meaningless to the loader) and 7 copies of the one rpath it needs (remember duplicate rpaths are now deprecated by Apple) plus one other irrelevant rpath.

culler commented 2 weeks ago

In my build of 10.4.beta4 based on this PR (with no optional packages) the only python extension modules which legitimately use @rpath are ecl.cpython-311-darwin.so primecount.cpython-311-darwin.so. But I should also fix symengine_py by adding the appropriate rpath options to its LDFLAGS as I did with primecountpy.

The rpds package was a red herring. It only uses @rpath in its LC_ID_DYLIB load command, which is not used by the loader and makes no difference anyway. Also, rpds is only installed as a dependency of jupyter and hence does not need Sage's LDFLAGS.

culler commented 2 weeks ago

Would it be OK with you if I cherry-pick the python3 upgrade (+ patch removal) onto a new PR?

Sure. Have at it.

culler commented 2 weeks ago

I found out why the ecl python extension did not cause tests to fail even though it has a LC_LOAD_DYLIB load command containing @rpath. The reason is that it has the correct LC_RPATH. In fact, it has two of them.:

LC_LOAD_DYLIB: @rpath/libecl.23.9.dylib
LC_LOAD_DYLIB: /Users/culler/sage_pr/sage/local/lib/libgc.1.dylib
LC_LOAD_DYLIB: /usr/lib/libSystem.B.dylib
LC_LOAD_DYLIB: /Users/culler/sage_pr/sage/local/lib/libgmp.10.dylib
LC_LOAD_DYLIB: /Users/culler/sage_pr/sage/local/lib/libpari-gmp-tls.dylib
LC_RPATH: /Users/culler/sage_pr/sage/local/lib
LC_RPATH: /Users/culler/sage_pr/sage/local/lib

That might help explain why it had the record-setting 7 identical rpaths before this PR. I don't know where those two come from, but I guess it is not a problem for this PR. It may become a problem when duplicate rpaths become errors, instead of producing deprecation warnings.

So that leave just the two optional packages symengine_py and bliss that will need to have rpaths set as wias already done with primecountpy.

culler commented 2 weeks ago

I have fixed the bliss and symengine packages by adding rpath commands to LDFLAGS. Both packages load without errors.

I think this PR is now ready to go.

dimpase commented 2 weeks ago

Are you saying that primecountpy needed a fix of some sort? IMHO it works without this addition.

culler commented 2 weeks ago

No I am not saying that. I am saying that as part of this PR I had to add a line of code in the spkg-install.in script which adds an rpath option to the LDFLAGS.

The primecountpy spkg was one of three that actually needs that option. Prior to this PR this option was added to LDFLAGS every time that sage-env was run leading to many duplicate rpath load commands which were almost never needed.

PS I know it is a pain to read a long PR description but I thought I should record what was going on since clearly there were some serious misconceptions.

On Mon, Apr 29, 2024, 7:57 AM Dima Pasechnik @.***> wrote:

Are you saying that primecountpy needed a fix of some sort? IMHO it works without this addition.

— Reply to this email directly, view it on GitHub https://github.com/sagemath/sage/pull/37886#issuecomment-2082656661, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ6CP2IDBWDRW3DXQ2VI7LY7Y7RXAVCNFSM6AAAAABG4MHXLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBSGY2TMNRWGE . You are receiving this because you authored the thread.Message ID: @.***>

dimpase commented 2 weeks ago

this is strange. primecountpy is a straight PyPI package. No woodoo is needed, just pip-install it.

I am under impression that anything path-specific for primecountpy is done at its build-time, and is all baked in there.

culler commented 2 weeks ago

It is not strange. When you install a pypi package from a source distribution, as sage does, and if the package requires an external library, as primecountpy does, then you have to inform the linker about where the external library is located. In rare cases (rare on macOS anyway) you may also need to specify an rpath.

Typically this is done by adding linker options to LDFLAGS. That is how sage does it. Currently sage adds an rpath option in sage-env. That causes many unused and duplicated rpath load commands. This PR changes that, for macOS only, by not adding the rpath in sage-env. Instead it adds the rpath option in the spkg-install.in script. This is only needed for one standard and two optional spkgs. The primecountpy package is the one standard spkg that needs an rpath. Nothing is changed in the package itself and the LDFLAGS used when building it is the same as it currently is. All other packages besides those three are built with a shorter LDFLAGS that omits the unused rpath option.

On Mon, Apr 29, 2024, 9:10 AM Dima Pasechnik @.***> wrote:

this is strange. primecountpy is a straight PyPI package. No woodoo is needed, just pip-install it.

— Reply to this email directly, view it on GitHub https://github.com/sagemath/sage/pull/37886#issuecomment-2082860758, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ6CP33VBUCFA3IAGPOKADY7ZIE3AVCNFSM6AAAAABG4MHXLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBSHA3DANZVHA . You are receiving this because you authored the thread.Message ID: @.***>

culler commented 1 week ago

If you follow the instructions in the old makefile you get an error. (Try it.) With the new instructions it works.

On Thu, May 2, 2024, 4:15 PM John H. Palmieri @.***> wrote:

@.**** commented on this pull request.

In build/make/Makefile.in https://github.com/sagemath/sage/pull/37886#discussion_r1588403514:

@@ -9,7 +9,7 @@

obscure the substance of the actual rules, this file can be debugged by

running:

# -# $ make -f build/make/Makefile -n DEBUG_RULES=1 +# $ make -f build/make/Makefile -n DEBUG_RULES=1 SAGE_PKGCONFIG=1

Can you explain this change?

— Reply to this email directly, view it on GitHub https://github.com/sagemath/sage/pull/37886#pullrequestreview-2036937233, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ6CP67VZXXD2GE6BGTCULZAKUI3AVCNFSM6AAAAABG4MHXLOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDAMZWHEZTOMRTGM . You are receiving this because you authored the thread.Message ID: @.***>

culler commented 1 week ago

I should have said that the error you see when running $ make -f build/make/Makefile -n DEBUG_RULES=1 is generated by this code which begins on line 20:

# Check a variable that is only set in build/make/install, but not in sage-env, for example
ifndef SAGE_PKGCONFIG
# Set by build/bin/sage-sdist, which invokes the Makefile directly in
# order to download upstream packages for distribution.
ifndef SAGE_SPKG_COPY_UPSTREAM
$(error This Makefile needs to be invoked by build/make/install)
endif
endif

So you can see why defining SAGE_PKGCONFIG prevents the error.

jhpalmieri commented 1 week ago

I should have said that the error you see when running $ make -f build/make/Makefile -n DEBUG_RULES=1 is generated by this code which begins on line 20:

# Check a variable that is only set in build/make/install, but not in sage-env, for example
ifndef SAGE_PKGCONFIG
# Set by build/bin/sage-sdist, which invokes the Makefile directly in
# order to download upstream packages for distribution.
ifndef SAGE_SPKG_COPY_UPSTREAM
$(error This Makefile needs to be invoked by build/make/install)
endif
endif

So you can see why defining SAGE_PKGCONFIG prevents the error.

Got it, thank you.

culler commented 1 week ago

I pushed a new change to Makefile.in which fixes the debug command without requiring the user to set the unrelated variable SAGE_PKGCONFIG.

The changes to the python spkg in this PR have already been merged into 10.4, using Matthias' cherry-picked PR. There are really very few changes left and removing all of the macOS rpath garbage is a major improvment. I would really appreciate some help in getting the rest of this PR merged. I think a positive review from someone might do the job. I don't think there are any unresolved issues remaining.

NathanDunfield commented 1 week ago

Looks great to me. Setting positive review.

mkoeppe commented 1 day ago

@culler Rebased as requested.

culler commented 1 day ago

@culler Rebased as requested.

Thanks!!!

culler commented 1 day ago

Very similar linker errors with fflas_ffpack have been independently reported for 10.4.beta6 on sage-release, confirming that those failures are unrelated to this PR.

Since I have done the needed work, and tested on both linux and macOS, I am going to remove the "needs work" label and restore the "positive review" label so this change can eventually make it into 10.4. If I am not allowed to do that, feel free to undo my changes to the labels.