sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.47k stars 488 forks source link

Meta-ticket: Create upstream repositories, pip-installable packages for database packages #30914

Open tobiasdiez opened 4 years ago

tobiasdiez commented 4 years ago

The Sage distribution contains a number of "database packages", many of which do not seem to have a "real upstream" (or installation procedures other than our scripts in build/pkgs).

We transform them into pip-installable packages and publish them to PyPI so that they can be installed using standard Python tools (and become Python dependencies of sagelib) and then discovered by libraries using standard Python facilities.

Role model: https://pypi.org/project/database-knotinfo/

For those that do not have a real upstream, we create separate git repositories in github.com/sagemath/ to serve as new upstream.

List of packages/tickets:

Architecture independent:

Prepared files:

New data sources:

Related (other stuff written by the Sage distribution into $SAGE_SHARE):

Techniques:

CC: @mkoeppe @jhpalmieri @kiwifb @isuruf @antonio-rojas @dimpase @slel @tobihan @jamesjer @soehms @videlec @williamstein @roed314

Component: build

Keywords: sd111

Issue created by migration from https://trac.sagemath.org/ticket/30914

mkoeppe commented 4 years ago
comment:1

Which files, specifically?

mkoeppe commented 4 years ago
comment:2

If you are talking about database packages such as build/pkgs/conway_polynomials, in particular those that do not seem to have a "real upstream" (or installation procedures other than our scripts in build/pkgs), then yes, I agree that a possible direction would be to create pip-installable packages for them and to publish them to PyPI. They should be maintained outside of the source tree, each in a separate git repository.

mkoeppe commented 4 years ago
comment:3

If this is what you had in mind, let's transform this ticket into a meta-ticket and get the downstream packagers of Sage on board -- they may have insights on the upstream status of some of these packages, which we may have forgotten about.

kiwifb commented 4 years ago
comment:4

Yes, a confirmation of which bits would be nice. If @mkoeppe is right you are talking about

Architecture independent:

Prepared files

conway_polynomials uses pickle.dump which could pose problems across python versions?

elliptic_curves is put into a SQL database with sqlite, this should be a stable format. Don't know if that's portable between, say, x86_64 and ppc64le?

mkoeppe commented 4 years ago
comment:5

OK, I've taken the liberty to rewrite the ticket description

mkoeppe commented 4 years ago

Description changed:

--- 
+++ 
@@ -1,11 +1,11 @@
-Problem: While working on #30371, I noticed that most files in `local/share` are more or less static, but are generated during `make`.
-I was wondering how to streamline this process a bit in order to make modularization easier, and to make it possible to install sagelib in different venv and without a previous run of `bootstrap`/`make`.
+The Sage distribution contains a number of "database packages", many of which do not seem to have a "real upstream" (or installation procedures other than our scripts in build/pkgs).

-Options that I see:
-1. Checkin the current version of these files (at least the almost static databases etc) in say `src/share`. Then they only need to be generated anew for update due to changes upstream. So one can remove the corresponding sage packages from the `make` pipeline.
+We transform them into pip-installable packages and publish them to PyPI so that they can be installed using standard Python tools (and become Python dependencies of sagelib).

-2. Publish these dependencies to pypi, and install them to the current venv using requirements.txt and/or pipfile.
+For those that do not have a real upstream, we create separate git repositories in github.com/sagemath/ to serve as new upstream.

-3. Generate these files during a call of `src/setup.py`, and include them in this way in the current venv.
+List of packages/tickets:

-Any other ideas/preferences about how to handle these files?
+TBD
+
+
mkoeppe commented 4 years ago
comment:6

Replying to @kiwifb:

Prepared files

  • conway_polynomials
  • elliptic_curves where the files are processed before install with some python code that may be depending on python or some other software.

conway_polynomials uses pickle.dump which could pose problems across python versions?

elliptic_curves is put into a SQL database with sqlite, this should be a stable format. Don't know if that's portable between, say, x86_64 and ppc64le?

For these ones, we should make the actual source distribution architecture-independent. The architecture-dependent products could be built during setup.py - and published as wheels.

mkoeppe commented 4 years ago
comment:7

For example database_stein_watkins_mini's SPKG.rst lists http://modular.math.washington.edu/papers/stein-watkins/ as upstream or source, which is defunct

mkoeppe commented 4 years ago

Description changed:

--- 
+++ 
@@ -6,6 +6,15 @@

 List of packages/tickets:

-TBD
+Architecture independent:
+
+- combinatorial_designs
+- graphs
+- polytopes_db
+
+Prepared files:
+
+- conway_polynomials
+- elliptic_curves
mkoeppe commented 4 years ago
comment:8

One thing that may need discussion is that for huge databases, one may want to avoid installing multiple copies

mkoeppe commented 4 years ago

Description changed:

--- 
+++ 
@@ -18,3 +18,8 @@
 - elliptic_curves

+
+Related (other stuff written by the Sage distribution into $SAGE_SHARE): 
+- #30306 - jupyter notebook related things
+- #20080 - documentation
+
slel commented 4 years ago
comment:10

Replying to @mkoeppe:

For example database_stein_watkins_mini's SPKG.rst lists http://modular.math.washington.edu/papers/stein-watkins/ as upstream or source, which is defunct

It's at https://wstein.org/papers/stein-watkins/

tobiasdiez commented 4 years ago
comment:11

Replying to @kiwifb:

Yes, a confirmation of which bits would be nice. If @mkoeppe is right you are talking about ...

Yes, that's exactly what I meant! Thanks everybody for their input, that's already way better than what I've imagined originally.

mkoeppe commented 3 years ago

Changed keywords from none to sd111

mkoeppe commented 3 years ago

Description changed:

--- 
+++ 
@@ -23,3 +23,7 @@
 - #30306 - jupyter notebook related things
 - #20080 - documentation

+Techniques:
+- `importlib.resources` - would help making a distribution `zip_safe` (#31306)
+- could consider advertise install location using a pkgconfig file: [#30787 comment:24](https://github.com/sagemath/sage/issues/30787#comment:24) 
+
mkoeppe commented 3 years ago

Description changed:

--- 
+++ 
@@ -17,7 +17,9 @@
 - conway_polynomials
 - elliptic_curves

+New data sources:

+- #30352 Interface to the KnotInfo and LinkInfo databases

 Related (other stuff written by the Sage distribution into $SAGE_SHARE): 
 - #30306 - jupyter notebook related things
mkoeppe commented 3 years ago

Description changed:

--- 
+++ 
@@ -8,9 +8,10 @@

 Architecture independent:

-- combinatorial_designs
-- graphs
-- polytopes_db
+- `combinatorial_designs` (single text file `MOLS_table.txt`), 8 kB
+- `graphs` (`graphs.db`, `brouwer_srg_database.json`, `smallgraphs.txt`, `isgci_sage.xml`), 336 kB; the `SPKG.rst` explains: The code used to parse the data from Andries E. Brouwer's website is available at https://github.com/nathanncohen/strongly_regular_graphs_database
+- `polytopes_db` (various files), 41 kB
+- `polytopes_db_4d` (various files), 878 MB

 Prepared files:
mkoeppe commented 3 years ago

Description changed:

--- 
+++ 
@@ -8,7 +8,7 @@

 Architecture independent:

-- `combinatorial_designs` (single text file `MOLS_table.txt`), 8 kB
+- `combinatorial_designs` (single text file `MOLS_table.txt`), 8 kB, https://repology.org/project/sagemath-combinatorial-designs/versions
 - `graphs` (`graphs.db`, `brouwer_srg_database.json`, `smallgraphs.txt`, `isgci_sage.xml`), 336 kB; the `SPKG.rst` explains: The code used to parse the data from Andries E. Brouwer's website is available at https://github.com/nathanncohen/strongly_regular_graphs_database
 - `polytopes_db` (various files), 41 kB
 - `polytopes_db_4d` (various files), 878 MB
mkoeppe commented 3 years ago

Description changed:

--- 
+++ 
@@ -21,6 +21,7 @@
 New data sources:

 - #30352 Interface to the KnotInfo and LinkInfo databases
+- `database_graver_ppi` - https://www.math.ucdavis.edu/~mkoeppe/art/ppi/index.html

 Related (other stuff written by the Sage distribution into $SAGE_SHARE): 
 - #30306 - jupyter notebook related things
mkoeppe commented 3 years ago

Description changed:

mkoeppe commented 3 years ago

Description changed:

--- 
+++ 
@@ -26,6 +26,7 @@
 Related (other stuff written by the Sage distribution into $SAGE_SHARE): 
 - #30306 - jupyter notebook related things
 - #20080 - documentation
+- #30885 - `sage.features`

 Techniques:
 - `importlib.resources` - would help making a distribution `zip_safe` (#31306)
mkoeppe commented 3 years ago
comment:21

Sage development has entered the release candidate phase for 9.3. Setting a new milestone for this ticket based on a cursory review of ticket status, priority, and last modification date.

mkoeppe commented 3 years ago
comment:24

For #32432 (sagemath-polyhedra), pip-installable polytopes_db, polytopes_db_4d would be good

mkoeppe commented 3 years ago

Description changed:

--- 
+++ 
@@ -15,7 +15,7 @@

 Prepared files:

-- conway_polynomials
+- #32747: `conway_polynomials`
 - elliptic_curves

 New data sources:
mkoeppe commented 2 years ago

Description changed:

--- 
+++ 
@@ -22,6 +22,8 @@

 - #30352 Interface to the KnotInfo and LinkInfo databases
 - `database_graver_ppi` - https://www.math.ucdavis.edu/~mkoeppe/art/ppi/index.html
+- https://mathdb.mathhub.info/
+- https://swmath.org/

 Related (other stuff written by the Sage distribution into $SAGE_SHARE): 
 - #30306 - jupyter notebook related things
mkoeppe commented 2 years ago

Description changed:

--- 
+++ 
@@ -33,4 +33,6 @@
 Techniques:
 - `importlib.resources` - would help making a distribution `zip_safe` (#31306)
 - could consider advertise install location using a pkgconfig file: [#30787 comment:24](https://github.com/sagemath/sage/issues/30787#comment:24) 
+- Reduce local duplication of large database package when venvs are in use: https://pypi.org/project/pydupes/ (similar to rdfind)

+
mkoeppe commented 2 years ago

Description changed:

--- 
+++ 
@@ -1,6 +1,8 @@
 The Sage distribution contains a number of "database packages", many of which do not seem to have a "real upstream" (or installation procedures other than our scripts in build/pkgs).

 We transform them into pip-installable packages and publish them to PyPI so that they can be installed using standard Python tools (and become Python dependencies of sagelib).
+
+Role model: https://pypi.org/project/database-knotinfo/

 For those that do not have a real upstream, we create separate git repositories in github.com/sagemath/ to serve as new upstream.
mkoeppe commented 2 years ago

Description changed:

--- 
+++ 
@@ -36,5 +36,8 @@
 - `importlib.resources` - would help making a distribution `zip_safe` (#31306)
 - could consider advertise install location using a pkgconfig file: [#30787 comment:24](https://github.com/sagemath/sage/issues/30787#comment:24) 
 - Reduce local duplication of large database package when venvs are in use: https://pypi.org/project/pydupes/ (similar to rdfind)
+- Avoid duplicate installation by using [PEP 660 editable wheels](https://peps.python.org/pep-0660/). Use a build backend that supports it: flit, pdm, hatchling, poetry, not setuptools. https://discuss.python.org/t/pep-660-and-setuptools/14855

+
+
mkoeppe commented 2 years ago

Description changed:

--- 
+++ 
@@ -1,6 +1,6 @@
 The Sage distribution contains a number of "database packages", many of which do not seem to have a "real upstream" (or installation procedures other than our scripts in build/pkgs).

-We transform them into pip-installable packages and publish them to PyPI so that they can be installed using standard Python tools (and become Python dependencies of sagelib).
+We transform them into pip-installable packages and publish them to PyPI so that they can be installed using standard Python tools (and become Python dependencies of sagelib) and then discovered by libraries using standard Python facilities.

 Role model: https://pypi.org/project/database-knotinfo/
mkoeppe commented 2 years ago
comment:35

Replying to @slel:

Replying to @mkoeppe:

For example database_stein_watkins_mini's SPKG.rst lists http://modular.math.washington.edu/papers/stein-watkins/ as upstream or source, which is defunct

It's at https://wstein.org/papers/stein-watkins/

The link "Click here for the database" (https://wstein.org/ecdb) and "Our database" (https://wstein.org/papers/stein-watkins/ecdb) are both broken

mkoeppe commented 2 years ago

Description changed:

--- 
+++ 
@@ -10,6 +10,7 @@

 Architecture independent:

+- #33881 New upstream for `database_stein_watkins`
 - `combinatorial_designs` (single text file `MOLS_table.txt`), 8 kB, https://repology.org/project/sagemath-combinatorial-designs/versions
 - `graphs` (`graphs.db`, `brouwer_srg_database.json`, `smallgraphs.txt`, `isgci_sage.xml`), 336 kB; the `SPKG.rst` explains: The code used to parse the data from Andries E. Brouwer's website is available at https://github.com/nathanncohen/strongly_regular_graphs_database
 - `polytopes_db` (various files), 41 kB
mkoeppe commented 2 years ago

Description changed:

--- 
+++ 
@@ -38,6 +38,7 @@
 - could consider advertise install location using a pkgconfig file: [#30787 comment:24](https://github.com/sagemath/sage/issues/30787#comment:24) 
 - Reduce local duplication of large database package when venvs are in use: https://pypi.org/project/pydupes/ (similar to rdfind)
 - Avoid duplicate installation by using [PEP 660 editable wheels](https://peps.python.org/pep-0660/). Use a build backend that supports it: flit, pdm, hatchling, poetry, not setuptools. https://discuss.python.org/t/pep-660-and-setuptools/14855
+- However there are [implementation restrictions in the interaction of editable wheels and implicit namespace packages](https://github.com/pfmoore/editables/blob/main/docs/source/implementation.md#import-hooks)
orlitzky commented 11 months ago

Can I have a new sagemath/ repo for https://github.com/orlitzky/mols-handbook-data please? This is the old combinatorial_designs data.

orlitzky commented 11 months ago

For #32432 (sagemath-polyhedra), pip-installable polytopes_db, polytopes_db_4d would be good

This one is going to be annoying because the interface to it relies on the nonstandard PALP executable names. PALP now has an upstream repository at https://gitlab.com/stringstuwien/PALP, but I've been waiting for my first easy PR to get merged before I proceed with the makefile changes needed to build all of the extra executables.

orlitzky commented 1 month ago

PALP upstream now supports the names used in the polytope databases. One less blocker.