sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.41k stars 475 forks source link

Use GNU Info to parse singular's Info file #32242

Open orlitzky opened 3 years ago

orlitzky commented 3 years ago

Our Singular interface currently contains a hand-written parser for Singular's "info" file. In this ticket, we add an optional GNU Info package to parse this file instead.

Relative to running "info", the hand-written parser has several drawbacks:

The one downside to requiring "info" is that it would add a new runtime dependency to sagelib. To avoid that, we make GNU Info optional, and suggest to the user that he install it when he tries to read Singular's documentation.

GNU Info (or its superset, Texinfo) are widespread, portable, and easy to install on all of the systems we support, so in most cases this should be a "free" improvement.

Depends on #32302

Component: packages: standard

Author: Michael Orlitzky

Branch/Commit: u/mjo/ticket/32242 @ 137a918

Issue created by migration from https://trac.sagemath.org/ticket/32242

orlitzky commented 3 years ago

Commit: fc7fd02

orlitzky commented 3 years ago

Author: Michael Orlitzky

orlitzky commented 3 years ago

Branch: u/mjo/ticket/32242

orlitzky commented 3 years ago

New commits:

285d8acTrac #XXXXX: new package for the standalone GNU Info reader.
21586aaTrac #XXXXX: drop SINGULAR_EXECUTABLE from src/bin/sage-env.
fc7fd02Trac #XXXXX: use GNU Info to obtain Singular docstrings.
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

4768540Trac #32242: new package for the standalone GNU Info reader.
47eb391Trac #32242: drop SINGULAR_EXECUTABLE from src/bin/sage-env.
c4f6938Trac #32242: use GNU Info to obtain Singular docstrings.
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from fc7fd02 to c4f6938

mkoeppe commented 3 years ago
comment:3

I don't know... shouldn't there be a feature request / PR to Singular to make the docstrings available via API?

orlitzky commented 3 years ago
comment:4

Singular renders its own interactive help using "info" or one of its alternatives: https://www.singular.uni-kl.de/Manual/latest/sing_16.htm

I don't think they can give us an API without making "info" a dependency of Singular itself.

mkoeppe commented 3 years ago
comment:5

OK I see.

I still don't get why it's necessary or helpful to replace our parser with calling the info program.

Wouldn't it be much easier to just obtain the help file location from Singular itself?

mkoeppe commented 3 years ago
comment:6

I would also think it's pretty problematic from the viewpoint of modularization. We may soon have a pip-installable package for the Sage functionality that uses Singular. The wheel could then vendor libsingular. But vendoring the info program? That would be rather strange.

orlitzky commented 3 years ago
comment:7

Replying to @mkoeppe:

OK I see.

I still don't get why it's necessary or helpful to replace our parser with calling the info program.

The interface gets a lot smaller and simpler.

Wouldn't it be much easier to just obtain the help file location from Singular itself?

Singular does know where it lives, but how do you propose we get it? Ask them to add a special command-line option to support our outlandish use case?

orlitzky commented 3 years ago
comment:8

Replying to @mkoeppe:

I would also think it's pretty problematic from the viewpoint of modularization. We may soon have a pip-installable package for the Sage functionality that uses Singular. The wheel could then vendor libsingular. But vendoring the info program? That would be rather strange.

Leave it as part of the main Sage package, assumed to be there by the Singular wheel? In almost all cases, the system Info will be used anyway. But I don't really know enough about what you're trying to do to make a good suggestion.

mkoeppe commented 3 years ago
comment:9

Replying to @orlitzky:

Singular does know where it lives, but how do you propose we get it?

Use feGetResource in libsingular_resources?

orlitzky commented 3 years ago
comment:10

Replying to @mkoeppe:

Replying to @orlitzky:

Singular does know where it lives, but how do you propose we get it?

Use feGetResource in libsingular_resources?

Looks like it will work, thanks. I still think it's kinda dumb to poorly reimplement a ubiquitous 300k tool in math software, but it's good to know that we can find the help file if we really need to.

mkoeppe commented 3 years ago
comment:11

info is a well-documented, stable, extremely simple file format, I don't think there's anything wrong with doing basic parsing on it in Python.

orlitzky commented 3 years ago
comment:12

Well, we don't really parse it. It's more like a grep:

sage: from sage.interfaces.singular import get_docstring                        
sage: get_docstring("Preface")

will dutifully pull up the "Preface" node without regard for where in the manual it lives (it's not a function that we wrap and should throw a KeyError instead). The current branch oversimplifies a bit here too, checking only that the parent node is "Functions", but you can pass a series of (sub)nodes to info to ensure that you're navigating to where you think you are.

It looks like we also install info files for maxima and giac. So not only would having "info" installed allow you to run the interactive help in a singular_console(), but it may help somebody read the maxima and giac docs, too.

mkoeppe commented 3 years ago
comment:13

I don't really have an objection to adding info as a package. Accessing the interactive help in the standalone giac, maxima, singular shipped with Sage is a nice-to-have feature; but I don't think this is a strong argument to making it a "standard" package.

However, I don't support replacing the in-Python code (no issues have been reported with it -- no matter how primitive the code may be) by calling out to the info program. This adds a complication that I would really hope to avoid.

orlitzky commented 3 years ago
comment:14

Conversely, I'm only willing to tolerate the new dependency if it eliminates these hacks. It may be true that no issues have been reported with the info parser, but how long have we been waiting for someone to write an spkg-configure.m4 for singular? The only reason no one has done it (singular has a pc file, and is a huge, slow-to-build package) is because everyone is afraid to step into a pile of shit when they open singular.py{,x} =)

Replacing the cruft is its own reward if it makes maintaining the singular interface easier. I'm unhappy about the dependency myself, but who does it really hurt? Doing something about SINGULARPATH is necessary before we can use Singular from the system. Using GNU Info to read the info file is an easy and principled way to get us to a point where everyone on Fedora, Debian, Gentoo, Homebrew, Conda, Nix, etc. can just use Singular (and Info) from their package manager. That's a big gain for what constitutes almost the entirety of our users. And who is penalized by having Info be a standard package? Maybe some BSD installs that aren't currently supported platforms? This should be something like bzip2 that no one ever builds.

mkoeppe commented 3 years ago
comment:15

Replying to @orlitzky:

who is penalized by having Info be a standard package?

There is no problem with making info a standard package.

But there is a problem in adding an additional non-Python runtime dependency to the Sage library. info cannot be installed by pip.

mkoeppe commented 3 years ago
comment:16

Replying to @orlitzky:

how long have we been waiting for someone to write an spkg-configure.m4 for singular?

This task was waiting for the stalled Singular upgrade ticket #25993.

orlitzky commented 3 years ago
comment:17

One final point for GNU Info:

The Singular manual is now 11MB in singular-4.2.1. The existing implementation loads the entire thing (not just the function documentation, as I've shown) into a global variable that lives forever. This permanently wastes roughly the same amount of memory:

sage: from sage.interfaces.singular import generate_docstring_dictionary        
sage: get_memory_usage()                                                        
6467.30859375
sage: generate_docstring_dictionary()                                           
sage: get_memory_usage()                                                        
6479.34765625

And that will get worse as the Singular manual grows.

mkoeppe commented 3 years ago
comment:18

By the way, this ticket does not solve the original problem at all. If we want to use system Singular on a system that does not have info, then the installed info from the proposed SPKG does not have information regarding the info file location. So you are only shifting the problem of locating the info file from runtime to build time; and you are introducing a new non-Python dependency for the sagelib runtime.

orlitzky commented 3 years ago
comment:19

Replying to @mkoeppe:

By the way, this ticket does not solve the original problem at all. If we want to use system Singular on a system that does not have info, then the installed info from the proposed SPKG does not have information regarding the info file location. So you are only shifting the problem of locating the info file from runtime to build time; and you are introducing a new non-Python dependency for the sagelib runtime.

While it's possible to put the info files anywhere, we can stick a colon on the end of $INFOPATH to have the SPKG try the default list as fallbacks: https://git.savannah.gnu.org/cgit/texinfo.git/tree/info/filesys.h#n70

If we manage to find a system that both doesn't have GNU Info and yet still goes out of its way to install info files in a crazy location... then I guess at that point we could use DEPCHECK to ensure that the Singular SPKG gets built on that system.

mkoeppe commented 3 years ago
comment:20

Sorry, overall this ticket looks like a solution in search of a problem, and one that will create new problems as a side effect.

The actual problem has a simple solution, comment:9.

orlitzky commented 3 years ago
comment:21

We could always make GNU Info an optional package, and take the same approach to

sage: singular.<function>?

that we take to

sage: singular_console()
> help <function>;

That is, it doesn't work unless you have Info installed. The docstring for the sage object would suggest the optional package if it's not available.

orlitzky commented 3 years ago
comment:22

Replying to @mkoeppe:

Sorry, overall this ticket looks like a solution in search of a problem, and one that will create new problems as a side effect.

The actual problem has a simple solution, comment:9.

This is only a solution if you insist on keeping our lame python implementation and all of the problems it comes with. It'll get the job done, but I won't be the one to do it.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

a82847cTrac #32242: use fallback defaults for $INFOPATH.
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from c4f6938 to a82847c

mkoeppe commented 3 years ago

Work Issues: reduce ticket to adding info as optional package

orlitzky commented 3 years ago

Dependencies: #32302

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from a82847c to 624a0e8

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

2acc06eTrac #32302: drop SINGULAR_EXECUTABLE from src/bin/sage-env.
fde32deTrac #32242: new optional package for the standalone GNU Info reader.
7d1e61bTrac #32242: use fallback defaults for $INFOPATH.
56186bfTrac #32242: use GNU Info to obtain Singular docstrings.
624a0e8Trac #32242: add "optional" tags for tests that require GNU Info.
orlitzky commented 3 years ago

Changed work issues from reduce ticket to adding info as optional package to none

orlitzky commented 3 years ago

Description changed:

--- 
+++ 
@@ -1,6 +1,25 @@
-In the description of #29024, a few problems are listed as prerequisites for accepting a system installation of Singular. Basically, they ask that we either use singular in a less crazy manner, or find a way to make our existing hacks work with a system copy of singular.
+Our Singular interface currently contains a hand-written parser for Singular's "info" file. In this ticket, we add an optional GNU Info package to parse this file instead.

-One of those issues is that the `sage/interfaces/singular.py` is using an environment variable `SINGULARPATH` to guess at the location where Singular's Info file is installed, so that it can be manually parsed.
+Relative to running "info", the hand-written parser has several drawbacks:

-In the interest of using Singular in a less crazy manner, I propose we add the standalone GNU "info" as a new package, and use it to read Singular's documentation from wherever it is configured to do so. This eliminates two of the problems in #29024.
+* The extra code is a maintenance burden. We should not be wasting
+  our time (poorly) reimplementing standard tools.

+* The custom parser is buggy. For example, it is supposed to raise a
+  `KeyError` when documentation for a non-existent function is
+  requested. However, the parser does not keep track of what section
+  it's in, so, for example, `get_docstring("Preface")` will happily
+  return the contents of the preface even though "Preface" is not a
+  Singular function.
+
+* The first time documentation is requested, the entire info file
+  (see above) is loaded into a dictionary. With the Singular info
+  file currently weighing in at around 11MiB, this is wasting a
+  good chunk of memory.
+
+* The `singular_console()` itself tries to launch GNU Info to display
+  its interactive help, and cannot use our hand-written parser.
+
+The one downside to requiring "info" is that it would add a new runtime dependency to sagelib. To avoid that, we make GNU Info optional, and suggest to the user that he install it when he tries to read Singular's documentation.
+
+GNU Info (or its superset, Texinfo) are widespread, portable, and easy to install on all of the systems we support, so in most cases this should be a "free" improvement.
orlitzky commented 3 years ago

New commits:

2acc06eTrac #32302: drop SINGULAR_EXECUTABLE from src/bin/sage-env.
fde32deTrac #32242: new optional package for the standalone GNU Info reader.
7d1e61bTrac #32242: use fallback defaults for $INFOPATH.
56186bfTrac #32242: use GNU Info to obtain Singular docstrings.
624a0e8Trac #32242: add "optional" tags for tests that require GNU Info.
orlitzky commented 3 years ago
comment:28

It's now completely optional. When "info" is not installed, you'll be prodded to install it:

sage: singular.align?                                                           
Signature:   singular.align(*args, **kwds)
Type:        SingularFunction
String form: align
File:        ~/src/sage.git/local/lib/python3.9/site-packages/sage/interfaces/singular.py
Docstring:  
This function is an automatically generated pexpect wrapper around the
Singular function 'align'.

EXAMPLES:

   sage: groebner = singular.groebner
   sage: P.<x, y> = PolynomialRing(QQ)
   sage: I = P.ideal(x^2-y, y+x)
   sage: groebner(singular(I))
   x+y,
   y^2-y

The relevant Singular documentation will be shown here if you install
GNU Texinfo, the stand-alone GNU Info program, or the SageMath "info"
SPKG.
mkoeppe commented 3 years ago
comment:29

I have opened #32323 for the uncontroversial part "Add GNU info as an optional package".

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

cdfa63cTrac #32242: use GNU Info to obtain Singular docstrings.
137a918Trac #32242: add "optional" tags for tests that require GNU Info.
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Changed commit from 624a0e8 to 137a918

mkoeppe commented 1 year ago

I think this should be closed as not wanted

orlitzky commented 1 year ago

Are you still opposed to this now that info is a standard package?

For the sage distribution, there's no additional runtime dependency because sage doesn't distinguish between build- and run-time dependencies; texinfo will be installed anyway.

On a linux distribution, texinfo will already be a runtime dependency of singular, because singular uses info for its interactive help. (It should really be a dependency of the singular SPKG if we expect singular_console() or sage --singular to work).

I'm asking because I discovered another minor issue:

mkoeppe commented 1 year ago

Are you still opposed to this now that info is a standard package?

Yes, because https://github.com/sagemath/sage/issues/32242#issuecomment-1418140192 still applies -- it would be a new obstacle to pip-installability

orlitzky commented 1 year ago

I don't see pip/pypi as different from a linux distribution and its package manager in this scenario. How does singular get installed by pip? Does its interactive help work?

This really only affects interactive uses of sage and not the library itself. (Yes, you can access a docstring programmatically, but realistically...). If singular's interactive help doesn't work, I don't think it's reasonable to expect its help to work interactively through sage, either. Especially not if it brings extra costs along with it, those mentioned in the initial issue, and now the fact that we have to ensure that /usr/share/info/singular.info is decompressed too.

mkoeppe commented 1 year ago

Singular will be shipped by the wheel of the distribution sagemath-symbolics (in prep at #35095)

I'm making all of Sage pip-installable, including interactive uses (that's provided by sagemath-repl)

orlitzky commented 1 year ago

Then aren't you going to find yourself in the same situation that linux distributions do? If we try (from our perspectives also as sage developers) to pretend that info isn't a dependency of singular, then attempting to use singular on its own will have a degraded experience, i.e. no help. That's a problem when you run singular independently from the CLI, but also when you run sage --singular or sage: singular_console(). The latter two should be deprecated IMO, but the bottom line is that if singular is a part of sage, then we have to install a usable singular.

mkoeppe commented 1 year ago

The purpose of shipping Singular in the pip-installable wheel will not be to provide a standalone Singular experience; only to use sage.libs.singular and sage.interfaces.singular

when you run sage --singular or sage: singular_console(). The latter two should be deprecated IMO

I agree with this; in particular the consoles don't work within Jupyter.

orlitzky commented 1 year ago

The purpose of shipping Singular in the pip-installable wheel will not be to provide a standalone Singular experience; only to use sage.libs.singular and sage.interfaces.singular

Ok, I at least know what we're talking about now, but I still think it's a long shot:

  1. sage has to be made pip-installable
  2. user has to choose to install sage via pip
  3. user wants to use the singular interface
  4. user doesn't have info installed on his operating system
  5. the interactive help for the singular interface is a hard requirement

All of those have to hold for the absence of the hand-rolled parser to become a problem. On the other hand, replacing it would immediately benefit the rest of us who actually use and develop sage right now.

mkoeppe commented 1 year ago

replacing it would immediately benefit the rest of us who actually use and develop sage right now.

I'd say this is an exaggeration. None of the drawbacks listed in the issue description are user-visible at all.

If info was intended to be used programmatically, there would be a shared library offering that functionality.

But info really is just a standalone alternative to the info browser in Emacs (which, of course, reads in the info format using emacs lisp.)

orlitzky commented 1 year ago

Wasting memory is user-visible. The inability to read compressed info pages is user-visible. The others, OK.

While each issue is minor on its own, there's a growing list of them, and only one relatively weak argument against fixing it.

orlitzky commented 1 year ago

If info was intended to be used programmatically, there would be a shared library offering that functionality.

I'm using info in the exact same way that singular itself does. The singular manual is designed with command-line info in mind. This is the closest thing to an "official" way to parse it that there is.