Closed soehms closed 3 years ago
I do the following changes:
I make TestSuite functionality more clear: I rename the former methods _test_recover
to user visible boolean methods is_recoverable
. I add a new _test_recover
to class KnotInfoSeries
which just runs the test using is_recoverable
and the tester
-option max_samples
.
I add a new method is_unique
to class KnotInfoBase
which tests if a proper link is unique in the database under isotopy. This is needed in get_knotinfo
and is_isotopic
in order to give more reliable answers in unclear situations.
I extend is_amphicheiral
to proper links by internal tests (needed for get_knotinfo
, as well).
I remove option oriented
from get_knotinfo
and improve the quality of its answer in unclear situations.
I add a warning to the docstring of method from_dowker_code
of class Knot
according to a hint of Chuck Livingston.
Replying to @tscrim:
Replying to @soehms:
I would have expected the database itself to have a consistency check like this.
Do you mean on the installation procedure with option -c
? I could run tests with larger samples there in addition to the doctests, shall I?
The link with the database somehow becomes broken, such as they change the name of a column. So the code breaks once you install the database. Granted, I think this is unlikely. Looking over the design a bit more, having a future developer should not naturally avoid the methods that differentiate between the two.
If the package will be upgraded to a new database version then the patchbots would detect a change of a column-name that wasn't performed in the static dictionary. But unfortunately, they don't take package tickets. Indeed, in the case of this package it would make sense if they would.
I agree that it is an advantage to have it tested. Although I do believe it should not be done locally within Sage's library but with a more robust testing framework. Yet, I believe that the benefits here clearly outweigh the costs.
What robust testing framework do you mean? If the database is not installed the all doctests of the ticket consume less than five seconds (on i5).
Replying to @soehms:
Replying to @tscrim:
Replying to @soehms:
I would have expected the database itself to have a consistency check like this.
Do you mean on the installation procedure with option
-c
? I could run tests with larger samples there in addition to the doctests, shall I?
No, I mean that there is a _test_database
type method on the database class. The -c
option is good, but I also think we shouldn't have that take too long.
I agree that it is an advantage to have it tested. Although I do believe it should not be done locally within Sage's library but with a more robust testing framework. Yet, I believe that the benefits here clearly outweigh the costs.
What robust testing framework do you mean? If the database is not installed the all doctests of the ticket consume less than five seconds (on i5).
There are patchbots/buildbots with the database that check that everything still works rather than a group of us with the database installed running tests after each beta release.
Branch pushed to git repo; I updated commit sha1. New commits:
440b5f3 | 30352: add _test_database and fix broken installation |
Replying to @tscrim:
Replying to @soehms:
No, I mean that there is a
_test_database
type method on the database class. The-c
option is good, but I also think we shouldn't have that take too long.
I add such a _test_database
method which tests for a random sample of 20 links (by default). I marked the TestSuite
doctest as long time
. It takes less than 2 seconds if the database isn't installed and about 20 seconds else (including loading of the database). In addition I had to do some changes since the installation was broken (because of the usage of feature
and UniqueRepresentation
). Furthermore, I make the -c
installation option take this long doctest, as well.
There are patchbots/buildbots with the database that check that everything still works rather than a group of us with the database installed running tests after each beta release.
Are such patchbots/buildbots already possible?
Replying to @soehms:
Replying to @tscrim:
Replying to @soehms:
No, I mean that there is a
_test_database
type method on the database class. The-c
option is good, but I also think we shouldn't have that take too long.I add such a
_test_database
method which tests for a random sample of 20 links (by default). I marked theTestSuite
doctest aslong time
. It takes less than 2 seconds if the database isn't installed and about 20 seconds else (including loading of the database). In addition I had to do some changes since the installation was broken (because of the usage offeature
andUniqueRepresentation
). Furthermore, I make the-c
installation option take this long doctest, as well.
Thank you. I think that will help with the testing.
Dima, Miguel, anyone else have any additional comments before I set this to a positive review?
There are patchbots/buildbots with the database that check that everything still works rather than a group of us with the database installed running tests after each beta release.
Are such patchbots/buildbots already possible?
It is possible, but with all the different combinations, it is impossible to maintain as there is some different behavior depending on certain optional (experimental?) packages being installed. Although I would advocate for having at least one buildbot that has all optional (and possibly experimental) packages installed that runs tests.
Typo: KontInfo
(3 times)
In terms of packaging, I think it would be much preferable to create a pip-installable package than to have a Sage specific upstream tarball.
See #30914 (Meta-ticket: Create upstream repositories, pip-installable packages for database packages)
Also, in SPKG.rst
please follow the new format of the title from #29655
I see that upstream stores the original files in excel spreadsheets and it is then exported to cvs with some substitutions in libreoffice. That is not a sustainable approach unless you have some libreoffice automated scripting.
I would recommend a python script using pandas [not included in sage] or a R script to perform such task.
Asides from those workflow issues some proper packaging as something pip installable would indeed be nice. It should be relatively trivial if we only install the data.
I see that upstream stores the original files in excel spreadsheets and it is then exported to cvs with some substitutions in libreoffice.
I am sure this explains KontInfo
. (Pardon my French...)
Replying to @mkoeppe:
Typo:
KontInfo
(3 times)
Thanks!
In terms of packaging, I think it would be much preferable to create a pip- installable package than to have a Sage specific upstream tarball.
I would like to do that, but I would prefer to do it in a follow-up ticket. Having never done this before, I will likely need advice and maybe help (plus time that I won't have until February). Are there any examples that I can see how I can do this?
Also, in
SPKG.rst
please follow the new format of the title from #29655
I will do that!
Replying to @kiwifb:
I see that upstream stores the original files in excel spreadsheets and it is then exported to cvs with some substitutions in libreoffice. That is not a sustainable approach unless you have some libreoffice automated scripting.
I know, this was only intended as a temporary solution (after failing to use pandoc
). I reported some minor (and non-significant) issues upstream and waited for them to provide new files. If not, the existing tarball is good enough to start.
I would recommend a python script using pandas [not included in sage] or a R script to perform such task.
Thanks for your suggestions. I will see which one is appropriate to implement such a script.
Asides from those workflow issues some proper packaging as something pip installable would indeed be nice. It should be relatively trivial if we only install the data.
Do you know examples that I can follow?
Replying to @soehms:
Replying to @kiwifb:
I would recommend a python script using pandas [not included in sage] or a R script to perform such task.
Thanks for your suggestions. I will see which one is appropriate to implement such a script.
Amusingly, I did the exact reverse for some people in the school of economics in my university. They had large csv files to download and they wanted to transform them into excel files - during the process we had to add some substitutions for missing values. They wanted the files in excel format as an input for STATA - I want to cry sometimes with some researchers.
Asides from those workflow issues some proper packaging as something pip installable would indeed be nice. It should be relatively trivial if we only install the data.
Do you know examples that I can follow?
Good one. We have identified that as a need for our data packages but I don't think we have done it with any. I cannot think of a python package that is a pure data load either. Possibly because people do not usually bother which is sad.
A few more comments.
If you do git grep SAGE_ROOT src/sage
, you will see that we have essentially eliminate use of this variable in the Sage library. This ticket reintroduces it, mixing Sage-the-distribution-specific code with Sage library code. That's not a good direction. In particular, Sage library code should not refer to SAGE_ROOT/build/pkgs/%s/package-version.txt
at all - as this may not be available in downstream distribution packaging of Sage.
The purpose of subclasses sage.features.StaticFile
is to provide an interface to discovering files in an installation. KnotInfoFilename.knots.sobj_path
should use sage.features.databases.DatabaseKnotInfo
to find the path, not the other way around.
I also don't fully understand the purpose of the data transformation that is happening at installation time, reading the csv files and creating many sobj files, in functions such as _create_col_dict_sobj
etc. Each of the little files is storing a dictionary mapping strings to strings as a pickle (sobj)?
Replying to @kiwifb:
Replying to @soehms:
Amusingly, I did the exact reverse for some people in the school of economics in my university. They had large csv files to download and they wanted to transform them into excel files - during the process we had to add some substitutions for missing values. They wanted the files in excel format as an input for STATA - I want to cry sometimes with some researchers.
I'm also amazed that pure math data is stored in Excel spreadsheets, but missing values haven't been a problem here so far (with the exception of the trivial knot, which I had to deal with separately in some cases). But there was a misplaced character and trailing and leading whitespaces (which of course can be handled using strip
).
The reason why I converted them to csv
is that I found no Excel reader included in Sage. You mentioned that pandas
isn't included in Sage, as well. So, how can I use it in spkg-install
?
Good one. We have identified that as a need for our data packages but I don't think we have done it with any. I cannot think of a python package that is a pure data load either. Possibly because people do not usually bother which is sad.
I am open to try making a prototype. But that should be on a follow-up ticket.
Replying to @mkoeppe:
A few more comments.
If you do
git grep SAGE_ROOT src/sage
, you will see that we have essentially eliminate use of this variable in the Sage library. This ticket reintroduces it, mixing Sage-the-distribution-specific code with Sage library code. That's not a good direction. In particular, Sage library code should not refer toSAGE_ROOT/build/pkgs/%s/package-version.txt
at all - as this may not be available in downstream distribution packaging of Sage.The purpose of subclasses
sage.features.StaticFile
is to provide an interface to discovering files in an installation.KnotInfoFilename.knots.sobj_path
should usesage.features.databases.DatabaseKnotInfo
to find the path, not the other way around.
Sorry, that I didn't realize that! Of course I will correct it!
Replying to @mkoeppe:
I also don't fully understand the purpose of the data transformation that is happening at installation time, reading the csv files and creating many sobj files, in functions such as
_create_col_dict_sobj
etc. Each of the little files is storing a dictionary mapping strings to strings as a pickle (sobj)?
Perhaps this is ridiculous given the size of these databases, but the purpose is to minimize the memory load. The user only needs a few of the 120 columns in the tables at a time (so why load them all each time).
I do the following:
three_genus
and signature
) of link properties which have been of interest, recently (#31188).L10a171_1_1_0
) that caused TestSuite
(with option max_samples=infinity
) fail for class KnotInfoDatabase
.KnotInfo
and KnotInfoSeries
into the global namespace in case the database is installed.Description changed:
---
+++
@@ -29,5 +29,5 @@
-Traball: https://github.com/sagemath/sage/files/ticket30352/knotinfo-20200813.tar.bz2.gz +Traball: https://github.com/soehms/sagemath_knotinfo/blob/main/knotinfo-20210201.tar.bz2?raw=true
Some suggestions for the upstream repository (in the direction of #30914):
database_knotinfo
, not sagemath_knotinfo
-- it will be useful to a broader community (Python)Replying to @mkoeppe:
Some suggestions for the upstream repository (in the direction of #30914):
- Call it
database_knotinfo
, notsagemath_knotinfo
-- it will be useful to a broader community (Python)- It's redundant to put versioned tarballs in a git repository - put instead the unpacked tarball there and have git take care of versioning
- When done, I can send you a pull request that turns this repository into a pip-installable package
Sounds good! I hope the new repository is as expected. Please don't hesitate to make any changes you think are necessary! Many Thanks!
Sage development has entered the release candidate phase for 9.3. Setting a new milestone for this ticket based on a cursory review of ticket status, priority, and last modification date.
Replying to @mkoeppe:
- When done, I can send you a pull request that turns this repository into a pip-installable package
I've tried it now on my own and at least it is working. But since I needed a lot of trial error cycles I would appreciate it if you could have a look at it.
The current commit here concerns the adaption to the pip-installable package. Furthermore, it contains adaptations to SnapPy 3.0.1 and an addition of some verbose messages to is_isotopic
(following a suggestion of knot theorists from the University of Regensburg).
Description changed:
---
+++
@@ -15,19 +15,15 @@
Many thanks to Allison Moore and Chuck Livingston for their kind permission to have this interface implemented and their offer to support us.
-
Having checked out the ticket for the first time, you have to run
-make SAGE_SPKG="sage-spkg -o" database_knotinfo-clean build +./configure --enable-download-from-upstream-url +sage -i database_knotinfo
in order to have the databases installed. If you like to run all relevant doctests on the installation use:
-make SAGE_SPKG="sage-spkg -o" SAGE_CHECK="yes" database_knotinfo-clean build +sage -i -c database_knotinfo
-
-
-Traball: https://github.com/soehms/sagemath_knotinfo/blob/main/knotinfo-20210201.tar.bz2?raw=true
-
I don't think it is necessary to cache homfly_polynomial()
as all of the key computational aspects are cached and so you don't cache the "same" object even though someone changed the variable name.
Other than that, I am happy with the current state of things. Does anyone else have any comments or suggestions?
Looks great to me!
Reviewer: Matthias Koeppe
I don't think it is necessary to cache homfly_polynomial() as all of the key computational aspects are cached and so you don't cache the "same" object even though someone changed the variable name.
I agree that this is not that effective. In general, my consideration concerning caching was that with the database available you easily can have hundreds or thousands of invocations of any method. Anyway, I think that is nothing that could hurt, and thus would keep it for a start.
Many thanks to everyone who helped to have this interface realized!
Changed branch from u/soehms/knotinfo to 9cde996
Follow up at #31921.
Another follow up at #32760.
At the moment Sage offers just a small set of 250 named knots (
src/sage/knots/knot_table.py
) taken form the Rolfsen table. Proper named links aren't available at all.Nowadays, larger databases for knots and links are available at the Knot Atlas pages in RDF-format and at KnotInfo as XLS / XLSX -files. Since parsing of CSV files is already supported by Sage, this is a good start to produce a Sage packages from these files containing about 3000 knots and 4000 proper links together with a lot of their properties and invariants.
Such a package has a couple of advantages:
The aim of this ticket is to have the databases accessible in Sage together with conversion methods for the most important properties and invariants.
Many thanks to Allison Moore and Chuck Livingston for their kind permission to have this interface implemented and their offer to support us.
Having checked out the ticket for the first time, you have to run
in order to have the databases installed. If you like to run all relevant doctests on the installation use:
CC: @miguelmarco @mkoeppe @kiwifb
Component: algebraic topology
Keywords: knot, link
Author: Sebastian Oehms
Branch:
9cde996
Reviewer: Matthias Koeppe
Issue created by migration from https://trac.sagemath.org/ticket/30352