SPKG that only build executables or data: Replace `spkg-install` by conda

mkoeppe commented 1 year ago

Inspired by the discussion in https://groups.google.com/g/sage-devel/c/AvH3xq2bCfo:

A subset of our SPKGs is only used by calling their executables, not linking to libraries. For such executables, there are no concerns about library / toolchain incompatibilities: We can take the executables from conda. So we can create an isolated conda environment somewhere within SAGE_LOCAL; perhaps $SAGE_LOCAL/var/lib/sage/conda. This could be implemented without user-visible changes in the installation process.

Examples of such SPKGs:

benzene
buckygen
cmake
csdp (?)
database_... (but these may be better to change to pip-installable packages, see #30914)
deformation
ffmpeg (currently dummy package)
frobby
gap3
gdb
gengetopt
gfan (to be checked: vendoring situation with singular)
git (currently dummy package)
github_cli
imagemagick (currently dummy package)
info
jmol
latte_int / lidia
lie
mathjax (?)
meson (but not meson-python)
palp / polytope_db...
pandoc (currently dummy package)
patch
patchelf
pdf2svg (currently dummy package)
python3
qepcad / saclib
rubiks
surf
tachyon
texlive (currently dummy package)
topcom

Separately: Python packages that run in separate Python processes. (Can install with conda or with pip.)

The frontend components of Jupyter/Jupyterlab (#30306); best done when the Jupyter-based notebook 7 comes out!
tox / virtualenv and their dependencies

This could also a solution for some packages of:

31176
https://github.com/sagemath/sage/issues/20977
27762
29900
closing numerous "wishlist" package tickets

culler commented 1 year ago

I don't think the criterion should be whether an spkg is only used by calling its executable. It also matters which libraries are link dependencies of the executable. A self-contained binary distribution, like our macOS binary, cannot include an executable which depends on a library unless it also includes that library. And that could require being forced to include multiple versions of the same library, possibly a large one. For example the frobby executable is linked against libgmp and libgmpxx, as are many spkgs that are not on that list. If you are going to copy the frobby executable binary from conda you will need to be sure that Sage and conda are using the same version of libgmp. You will probably also need to adjust the rpath of the executable, although that is something that we already have to do in order to make the macOS app relocatable.

Incidentally, the rpath situation in Sage on macOS is a total mess. almost all executables end up being built with two or three copies of the same rpath, sometimes with other, incorrect, rpaths mixed in.

I think this plan is teetering on the brink of that dark place known as DLL Hell. We should proceed with caution.

mkoeppe commented 1 year ago

We're not copying binaries from conda; we have a conda environment in SAGE_LOCAL/var/lib/sage/conda, which will contain the executables together with all needed libraries.

Yes, there will likely be two copies of basic libraries such as libgmp.

The size of the duplication should certainly be one of the criteria for deciding which packages we get from conda instead of building them ourselves.

culler commented 1 year ago

I would just like to introduce the two elephants I see in this room.

The first elephant is the maxim "If it ain't broke then don't fix it." Or, as paraphased by Matthias, "Best practices dictate to not create real problems by addressing imaginary problems." An spkg which builds without issues and works correctly is not broken and should not be "fixed". Especially, it should not be "fixed" under the guise of reducing maintenance. No maintenance is the least amount of maintenance possible. The wise thing to do is to wait until an optional package actually causes trouble before you change it.

The second elephant is the principle, a core principle of the sage project I am told, that including multiple copies or, worse, versions of the same library is evil. Why else would it be so important to the project to try to use system versions of libraries whenever possible. I remember, long ago, when I proposed statically linking cypari with gmp and pari in order to make it self-contained and installable as a binary package from pip. (That is how cypari works now, and is the primary reason that you can install it on Windows as well as on unix.) The blowback when I suggested that was immediate and intense. Including two copies of pari (even though one would be embedded and 1not available from the filesystem) was unthinkable. (It turns out that importing cypari and cypari2 into sage causes no problems, as far as I know.) I am not particularly a supporter of this principle, although I do worry about using up too much disk space. But I am surprised to see no reaction at all to this proposed demotion of it.

mkoeppe commented 1 year ago

The wise thing to do is to wait until an optional package actually causes trouble before you change it.

Exactly. Of course, #35585 violates this principle – for didactical reasons:

There's no trouble with our rubiks package at all (except that the Sage project is kind of its upstream).
The info package is merely outdated, but that is not causing any trouble, nor would I expect it to for a long while. So if the project was concerned for some reason about keeping it up to date, then that would be "fixed" by this change.
The valgrind package is wildly outdated, our version 3.14.0 dates from 2018, and likely an upgrade would be necessary for it to work properly on some platforms; but no trouble has been reported so far, so probably very few people are using the Sage distribution to install it.

sagemath / sage

SPKG that only build executables or data: Replace `spkg-install` by conda #35583

31176

27762

29900