sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.38k stars 469 forks source link

Build the reference manual incrementally #6495

Closed e14f4152-4982-4ace-8c95-73a0599b109b closed 11 years ago

e14f4152-4982-4ace-8c95-73a0599b109b commented 15 years ago

Building the Sage reference manual can take a significant amount of time. Decreasing this time could speed up Sage development.

The patch is large, but most of it consists of moving files from one location to another, as described below. A summary of the changes:

Changes in doc/en/reference — this is where the size of the patch comes from, although the changes are pretty simple:

Changes to doc/common/builder.py:

Other changes:


The html docs for Sage 5.4.rc2, html without MathJax, html with MathJax, and pdf, built after applying the patches here.


Apply:


Before building the docs, you should delete the documentation output directory: rm -rf SAGE_ROOT/devel/sage/doc/output. To test this, you should run sage --docbuild all html and sage --docbuild all pdf. (Note: just running sage --docbuild reference html will probably produce many warnings. If you run it a second time, the warnings should go away.)

Depends on #13064 Depends on #8327 Depends on #13891

Dependencies: 5.7.beta2 + #13064, #8327, #13891

CC: @jhpalmieri @nexttime @nilesjohnson @hivert @mguaypaq @mwhansen

Component: documentation

Keywords: days38

Author: Mitesh Patel, John Palmieri, Florent Hivert

Reviewer: Volker Braun, Florent Hivert

Merged: sage-5.8.beta0

Issue created by migration from https://trac.sagemath.org/ticket/6495

e14f4152-4982-4ace-8c95-73a0599b109b commented 15 years ago

Experimental.

e14f4152-4982-4ace-8c95-73a0599b109b commented 15 years ago

Author: Mitesh Patel

e14f4152-4982-4ace-8c95-73a0599b109b commented 15 years ago
comment:1

Attachment: trac_6495_ref_pdf_pieces.patch.gz

The attached patch is experimental. Notes:

e14f4152-4982-4ace-8c95-73a0599b109b commented 15 years ago
comment:2

On cross-PDF document links: It seems that Sphinx does not produce these. This may OK, since file:// URLs can break easily.

e14f4152-4982-4ace-8c95-73a0599b109b commented 15 years ago
comment:3

On the \ZZ in arithgroup.tex: It seems the problem stems from \@title in

    \ifsphinxpdfoutput
      \begingroup
      % This \def is required to deal with multi-line authors; it               
      % changes \\ to ', ' (comma-space), making it pass muster for             
      % generating document info in the PDF file.                               
      \def\\{, }
      \pdfinfo{
        /Author (\@author)
        /Title (\@title)
      }
      \endgroup
    \fi

in Sphinx's manual.cls. For some reason, the \math* font commands do not work in this context. I gather that \mathbf is preferred, but one workaround is to use

Arithmetic Subgroups of `{\rm SL}_2({\bf Z})`

in place of

Arithmetic Subgroups of `{\rm SL}_2(\ZZ)`

in arithgroup.rst.

e14f4152-4982-4ace-8c95-73a0599b109b commented 14 years ago

Attachment: trac_6495-reference_breakup.patch.gz

Another approach. Depends on #7549. Still experimental. This patch only. sage repo.

e14f4152-4982-4ace-8c95-73a0599b109b commented 14 years ago
comment:4

The new patch may make it possible to build and update reference manual chapters semi-independently. I think we can use the intersphinx extension to fix the cross-chapter references. But we'll need to build the manual twice, a la LaTeX.

To build just a chapter, try, e.g.,

sage -docbuild reference/algebras html -juiv3

Still to do:

e14f4152-4982-4ace-8c95-73a0599b109b commented 14 years ago
comment:5

Another important item:

e14f4152-4982-4ace-8c95-73a0599b109b commented 14 years ago
comment:6

If this approach is viable, I suggest leaving many (most?) of the "To Do" items for other tickets.

e14f4152-4982-4ace-8c95-73a0599b109b commented 14 years ago
comment:7

While I'm here:

e14f4152-4982-4ace-8c95-73a0599b109b commented 14 years ago

Description changed:

--- 
+++ 
@@ -1,4 +1 @@
-Is the logical division at the module level, with the non-auto-generated `.rst` files as guides?
-
-Related: Should the indices be single-column?
-
+Building the Sage reference manual can use significant computer resources.  Easing the burden could speed up Sage development.
e14f4152-4982-4ace-8c95-73a0599b109b commented 14 years ago

PDF fixes. This patch only. sage repo.

e14f4152-4982-4ace-8c95-73a0599b109b commented 14 years ago
comment:9

Attachment: trac_6495-reference_breakup_v2.patch.gz

Sphinx caches "foreign" object inventories in a document's environment.pickle. These now use a lot of disk space.

e14f4152-4982-4ace-8c95-73a0599b109b commented 14 years ago
comment:10

Replying to @qed777:

Sphinx caches "foreign" object inventories in a document's environment.pickle. These now use a lot of disk space.

Another sphinx-dev query.

jhpalmieri commented 13 years ago
comment:12

Here's a new patch, rebased to Sage 4.7.1.alpha4. This implements parallel building, and it provides a great speedup, at least on systems with lots of processors. For example, on sage.math, the time to execute sage -docbuild reference html -j went from about 18 minutes to just under 2 minutes. The main idea is to build each module of the reference manual separately, and use the Sphinx intersphinx extension to handle cross-references (so :class:`blah` will work in the algebras module, even if blah is defined in the rings module).

Remaining issues:

vbraun commented 13 years ago
comment:13

In an ideal world sphinx would be multithreaded, but we probably shouldn't wait for that ;-) The remaining issues about disk space, bibliographic references, and needing two runs seem to be unavoidable. Building parallel gets more and more important, so I think the benefits outweigh the disadvantages.

I tried the patch on Sage-4.7.1.alpha4 without any other patches applied:

vbraun commented 13 years ago

Reviewer: Volker Braun

jhpalmieri commented 13 years ago

Changed author from Mitesh Patel to Mitesh Patel, John Palmieri

jhpalmieri commented 13 years ago

Dependencies: 11251

jhpalmieri commented 13 years ago
comment:14

Replying to @vbraun:

I tried the patch on Sage-4.7.1.alpha4 without any other patches applied:

  • Only the main page has proper css. For example, html/en/reference/cmd/index.html refers to _static/sage.css but the correct path would be ../_static/sage.css.

This was a mistake in the previous version: it was supposed to create a link from reference/_static to reference/cmd/_static. Now it should work.

  • patch conflicts with #11251 (todo extension). Given that the latter is already positively reviewed, maybe this ticket could be rebased on top of it?

Good point. This raises another problem: intersphinx doesn't easily pass todo lists between different documents, so I don't know how to get a master todo list for the Sage library. Right now, I've put the todolist for each module after its table of contents. I think combinat is the only module with any actual to do elements.

  • During the sage build, I think we should then run the docbuilder twice for the reference manual. Perhaps this should always be done for sage -docbuild all.

Done: sage -docbuild all now builds the reference manual twice. I also added a few print statements to the docbuild process.

  • Can we make output less verbose? The whole intersphinx output scrolled forever off my screen. Ideally, an interspinx failure to find an inventory file would only add one extra line at the end of the build along the lines of "You should re-run docbuild to get references right."

I've tried to do this when doing sage -docbuild all and not in general, but it may be suppressing too much output. (In the first pass, all warnings are suppressed, including intersphinx warnings, and in the second pass, any warnings should be printed. But in the second pass, it's just rewriting output, taking intersphinx links into account -- it's not reading the sources a second time, so it doesn't produce warnings about missing bibliographic references.)

Other issues:

jhpalmieri commented 13 years ago

Description changed:

--- 
+++ 
@@ -1 +1,7 @@
 Building the Sage reference manual can use significant computer resources.  Easing the burden could speed up Sage development.
+
+Apply:
+
+- [attachment: trac_6495-subdivide-ref-manual.patch](https://github.com/sagemath/sage-prod/files/10645445/trac_6495-subdivide-ref-manual.patch.gz)
+
+Also, before building the docs, it's probably a good idea to delete the documentation output directory: `rm -rf SAGE_ROOT/devel/sage/doc/output`.
jhpalmieri commented 13 years ago
comment:16

Here's a new version of the patch. This still has the same issue with "todo" items: I don't know a way to accumulate all of them from the different Sage modules, so they are just recorded module-by-module. For PDF output, the main documentation page (in SAGE_ROOT/devel/sage/doc/output/html/en/index.html) has the little PDF icons, and now when you click on the one for the reference manual, it actually opens an html file with links to all of the different PDF files.

I'm marking this for review. If we can come up with a good solution for "todo" items, that would be great, but perhaps we could defer it to another ticket.

jhpalmieri commented 13 years ago
comment:17

Okay, so this is not the most elegant solution, but in the most recent patch, in the html version of the reference manual, after everything is built, it searches through all of the output files algebra/index.html, arithgroup/index.html, etc., looking for todo lists. When it finds them, it copies them to todolist/index.html. This only works for the html version; for other formats, the todo list file says "The combined to do list is only available in the html version of the reference manual."

jhpalmieri commented 13 years ago
comment:18

Here's a new version; the only difference is this change to SAGE_ROOT/devel/sage/spkg-dist:

diff --git a/spkg-dist b/spkg-dist
--- a/spkg-dist
+++ b/spkg-dist
@@ -38,15 +38,23 @@ fi

 # Remove the .cython_hash file, since including this in the bdist will
 # completely break "sage -br". 
-# Also, do not distribute these build files (.os and .os); 
+# Also, do not distribute these build files (.so and .os);
 # it wastes space and causes problems!

-rm -f .cython_hash c_lib/*.so c_lib/*.os  
+rm -f .cython_hash c_lib/*.so c_lib/*.os

 # Delete all the autogenerated files, since they will get created again
 # when SAGE is built or upgraded.  
 cd sage; "$CUR"/spkg-delauto .; cd ..

+# Delete the autogenerated files in the doc directory.
+rm -rf doc/output
+rm -rf doc/en/reference/sage
+rm -rf doc/en/reference/sagenb
+rm -rf doc/en/reference/static
+rm -rf doc/en/reference/templates
+rm -rf doc/en/reference/*/sage sage/doc/en/reference/*/static sage/doc/en/reference/*/templates
+
 # Create the sdist using Python's distutils.
 python setup.py sdist

This makes for a slightly smaller sage-....spkg file, and more importantly, if the old autogenerated files are there, they can confuse the docbuild process.

jhpalmieri commented 13 years ago
comment:19

Some recent timings on sage.math.

Before the patch:

$ rm -rf SAGE_ROOT/devel/sage/doc/output
$ time sage -docbuild reference html
...
real    17m49.313s
user    16m57.530s
sys 0m45.390s

$ rm -rf SAGE_ROOT/devel/sage/doc/output
$ time sage -docbuild reference pdf
...
real    26m3.623s
user    24m40.290s
sys 0m43.030s

After the patch:


$ rm -rf SAGE_ROOT/devel/sage/doc/output
$ time sage -docbuild reference html
...
real    2m30.092s
user    10m34.900s
sys 1m12.590s

$ rm -rf SAGE_ROOT/devel/sage/doc/output
$ time sage -docbuild reference pdf
...
real    3m35.064s
user    15m49.790s
sys 1m11.070s
jhpalmieri commented 13 years ago
comment:20

Question: if you type "sage -docbuild -D" now, it says

$ sage -docbuild -D
DOCUMENTs:
    de/tutorial          a_tour_of_sage       bordeaux_2008        
    constructions        developer            faq                  
    installation         numerical_sage       reference            
    thematic_tutorials   tutorial             website              
    fr/a_tour_of_sage    fr/tutorial          ru/tutorial          
    all  (!)             
(!) Builds everything.

Should we also mention "reference/MODULE" as a valid target?

jhpalmieri commented 13 years ago

use only this patch

jhpalmieri commented 13 years ago
comment:21

Attachment: trac_6495-subdivide-ref-manual.2.patch.gz

jhpalmieri commented 13 years ago

Description changed:

--- 
+++ 
@@ -1,4 +1,32 @@
 Building the Sage reference manual can use significant computer resources.  Easing the burden could speed up Sage development.
+
+The patch is large, but most of it consists of moving files from one location to another, as described below.  A summary of the changes:
+
+Changes in `doc/en/reference` — this is where the size of the patch comes from, although the changes are pretty simple:
+
+- rearrange the directory doc/en/reference: for each file like algebras.rst, create a subdirectory `algebras` and move `algebras.rst` to `algebras/index.rst`.  Also create a file `algebras/conf.py` for the build configuration.  All of these new conf.py files are identical.  Deal with the contents of the directory `reference/media` similarly, moving the pictures to the appropriate subdirectory.
+- modify `reference/index.rst` to point to these new files.
+- reorganize `reference/index.rst` so it is arranged, at least somewhat, by topic.
+- add intersphinx to `conf.py` — see below.  Also add the new subdirectories to the list `exclude_trees`.
+- new file `conf_sub.py`, configuration for the pieces of the documentation (as opposed to the main `conf.py`, which is for building `reference/index.rst`).  This file is imported by each of the files `SUBDIRECTORY/conf.py`.
+
+Changes to `doc/common/builder.py`:
+
+- add code to build the reference manual in sections, and also to build the sections in parallel.  The reference manual ought to be built twice to resolve references now, so typing "sage -docbuild all html" will build it twice (along with all of the other documents, of course).  "sage -docbuild reference html" will just build it once.  You can also run "sage -docbuild reference/combinat html", for example, to just build one part of the manual.
+- the different parts of the manual are separate documents as far as sphinx is concerned, so allow them to reference each other using the "intersphinx" extension.  (This is why we need to build it twice: the first pass assembles the intersphinx databases, the second pass uses the databases to create the references correctly.)
+- to accomodate the changes in #11251, which don't seem to be easily compatible with intersphinx, search through the output files looking for "todo" items, and accumulate them in one master "todo" list.
+- for pdf format, since it now produces 30 different pdf files, write an html file which links to each of them.
+
+Other changes:
+
+- `doc/common/conf.py`: add the intersphinx extension to the build process.
+- `doc/common/themes/sage/layout.html`: fix a bug where clicking the Sage logo in the upper left corner of the docs wouldn't take you to the right place, at least in the local documentation.
+- `doc/common/themes/sageref/`: add a new theme for the pieces of the reference manual.  This is almost identical to the "sage" theme, except for a little tinkering to the links along the top and bottom lines.
+- in the main Sage library, change a few pathnames to media files in the reference manual, since those files have been moved.
+- `spkg-dist`: when building the main sage spkg file, delete all of the autogenerated files from the doc directory.  This is important because if some autogenerated files from before the patch are still present after the patch, the docbuild process can occasionally get confused.  It also saves some space, making a smaller spkg file.
+- make the necessary changes to .hgignore and MANIFEST.in to deal with the relocated files.
+
+---

 Apply:
jhpalmieri commented 13 years ago

use only this patch

jhpalmieri commented 13 years ago

Changed dependencies from 11251 to 11251, 11298

jhpalmieri commented 13 years ago

Description changed:

--- 
+++ 
@@ -30,6 +30,7 @@

 Apply:

-- [attachment: trac_6495-subdivide-ref-manual.patch](https://github.com/sagemath/sage-prod/files/10645445/trac_6495-subdivide-ref-manual.patch.gz)
+- [attachment: trac_6495-part1-moving-files.patch](https://github.com/sagemath/sage-prod/files/10645464/trac_6495-part1-moving-files.patch.gz) — this moves 'algebras.rst' to 'algebras/index.rst', and similarly for all other files.  It adds `.. include:: ../footer.txt` to the end of each of these files, and it removes any cross-referencing information like `.. _ch:groups:`, since that doesn't work anymore with the new structure.  It also creates identical files 'DIR/conf.py' in each of the new subdirectories of doc/en/reference, except for doc/en/algebras/conf.py.  That file is created in the next patch so that you can focus on reviewing just the second patch.
+- [attachment: trac_6495-part2-everything-else.patch](https://github.com/sagemath/sage-prod/files/10645471/trac_6495-part2-everything-else.patch.gz) — this does everything else; in other words, all of the important content is in this patch.

-Also, before building the docs, it's probably a good idea to delete the documentation output directory: `rm -rf SAGE_ROOT/devel/sage/doc/output`.
+Before building the docs, you should delete the documentation output directory: `rm -rf SAGE_ROOT/devel/sage/doc/output`.
jhpalmieri commented 13 years ago
comment:22

Attachment: trac_6495-subdivide-ref-manual.patch.gz

Here's a new version, with #11298 as a new dependency. (It may apply without #11298, perhaps with fuzz.) To help with reviewing, I've broken the patch into two pieces, as explained in the ticket description.

jhpalmieri commented 13 years ago

Changed dependencies from 11251, 11298 to #11251, #11298

robertwb commented 13 years ago
comment:24

Could you post a link to the generated docs so people could browse them?

jhpalmieri commented 13 years ago
comment:25

Replying to @robertwb:

Could you post a link to the generated docs so people could browse them?

Good idea:

williamstein commented 13 years ago

Changed keywords from none to sd32

jdemeyer commented 12 years ago
comment:28

Testing this against sage-4.8.alpha1 + #10620...

jdemeyer commented 12 years ago

Changed dependencies from #11251, #11298 to none

jdemeyer commented 12 years ago
comment:29

Against sage-4.8.alpha1:

patching file doc/en/reference/games/index.rst
Hunk #1 FAILED at 6
1 out of 1 hunks FAILED -- saving rejects to file doc/en/reference/games/index.rst.rej
patching file doc/en/reference/graphs/index.rst
Hunk #1 succeeded at 52 with fuzz 1 (offset 2 lines).
abort: patch failed to apply
jdemeyer commented 12 years ago
comment:30

Also: I'm not sure whether building totally in parallel should be the default.

jhpalmieri commented 12 years ago
comment:31

Here are rebased patches, along with the following change: there is now an environment variable, SAGE_PARALLEL_DOCBUILD, which if set to anything nonempty which doesn't start with "n", causes the reference manual to be built in parallel. I also added "doc-parallel" and "doc-pdf-parallel" targets to the main Makefile with a patch to the root repo.

jhpalmieri commented 12 years ago

Description changed:

--- 
+++ 
@@ -32,5 +32,6 @@

 - [attachment: trac_6495-part1-moving-files.patch](https://github.com/sagemath/sage-prod/files/10645464/trac_6495-part1-moving-files.patch.gz) — this moves 'algebras.rst' to 'algebras/index.rst', and similarly for all other files.  It adds `.. include:: ../footer.txt` to the end of each of these files, and it removes any cross-referencing information like `.. _ch:groups:`, since that doesn't work anymore with the new structure.  It also creates identical files 'DIR/conf.py' in each of the new subdirectories of doc/en/reference, except for doc/en/algebras/conf.py.  That file is created in the next patch so that you can focus on reviewing just the second patch.
 - [attachment: trac_6495-part2-everything-else.patch](https://github.com/sagemath/sage-prod/files/10645471/trac_6495-part2-everything-else.patch.gz) — this does everything else; in other words, all of the important content is in this patch.
+- [attachment: trac_6495-root.patch](https://github.com/sagemath/sage-prod/files/10645446/trac_6495-root.patch.gz) — root repo.  Add "doc-parallel" and "doc-pdf-parallel" targets to the main Makefile.

 Before building the docs, you should delete the documentation output directory: `rm -rf SAGE_ROOT/devel/sage/doc/output`.
jhpalmieri commented 12 years ago

Attachment: trac_6495-root.patch.gz

root repo

jhpalmieri commented 12 years ago
comment:33

By the way, the default in the new patch is to build serially. I've also added a brief description of SAGE_PARALLEL_DOCBUILD to the installation guide.

jhpalmieri commented 12 years ago
comment:34

Some other possible changes: in the parallel-building code (from builder.py)

            from multiprocessing import Pool, cpu_count
            max_cpus = 8 if SAGE_PARALLEL_DOCBUILD else 1
            pool = Pool(min(max_cpus, cpu_count()))

perhaps change "else 1" to "else 2"? As it is, building serially (with max_cpus set to 1) is slower than the current system, because in the new system, the manual has to be built twice to resolve cross-references.

We could also change "pool" to just "Pool(cpu_count())" or "Pool(int(1.5 * cpu_count()))" or something like that, eliminating the minimum of 8 and possibly increasing the maximum.

jhpalmieri commented 12 years ago

Description changed:

--- 
+++ 
@@ -1,4 +1,4 @@
-Building the Sage reference manual can use significant computer resources.  Easing the burden could speed up Sage development.
+Building the Sage reference manual can take a significant amount of time. Decreasing this time could speed up Sage development.

 The patch is large, but most of it consists of moving files from one location to another, as described below.  A summary of the changes:
jdemeyer commented 12 years ago
comment:35

Replying to @jhpalmieri:

Some other possible changes: in the parallel-building code (from builder.py)

            from multiprocessing import Pool, cpu_count
            max_cpus = 8 if SAGE_PARALLEL_DOCBUILD else 1
            pool = Pool(min(max_cpus, cpu_count()))

perhaps change "else 1" to "else 2"?

Why? It wouldn't make sense to build with more processes than there are CPUs.

We could also change "pool" to just "Pool(cpu_count())" or "Pool(int(1.5 * cpu_count()))" or something like that, eliminating the minimum of 8 and possibly increasing the maximum.

Why? It wouldn't make sense to build with more processes than there are CPUs.

As I mentioned on sage-devel, I don't like that there is an option to doctest in parallel, a different option to build the docs in parallel, a different option to build in parallel... I would say: let there be one environment variable SAGE_NUM_PROCESSES or something like that and use that for everything.

jhpalmieri commented 12 years ago
comment:36

Replying to @jdemeyer:

Replying to @jhpalmieri:

Some other possible changes: in the parallel-building code (from builder.py)

            from multiprocessing import Pool, cpu_count
            max_cpus = 8 if SAGE_PARALLEL_DOCBUILD else 1
            pool = Pool(min(max_cpus, cpu_count()))

perhaps change "else 1" to "else 2"?

Why? It wouldn't make sense to build with more processes than there are CPUs.

This version still does min(max_cpus, cpu_count()), so it won't use more processes than there are CPUs.

We could also change "pool" to just "Pool(cpu_count())" or "Pool(int(1.5 * cpu_count()))" or something like that, eliminating the minimum of 8 and possibly increasing the maximum.

Why? It wouldn't make sense to build with more processes than there are CPUs.

I see lots of suggestions on the internet to set MAKE=make -jN where N is 1.5 * (the number of cpus), or 1 or 2 more than the number of cpus. Why not here as well?

As I mentioned on sage-devel, I don't like that there is an option to doctest in parallel, a different option to build the docs in parallel, a different option to build in parallel... I would say: let there be one environment variable SAGE_NUM_PROCESSES or something like that and use that for everything.

I think maybe two variables: one (SAGE_PARALLEL) to enable parallel processes, one (SAGE_NUM_THREADS) to determine the maximum number of processes. The first could be "no" by default, and the second could be "0" by default, meaning use cpu_count() or min(8, cpu_count()) or some other variant on this, the way we do with NUM_THREADS in Makefile and sage-ptest. Then it's easy to turn on and off without remembering how many cores your machine has.

williamstein commented 12 years ago
comment:37

Why not exactly one environment variable "MAKE" which gets set to "make -jN" for some N, and that is it? I suggest this, since that's what I'm used to already for years. I don't claim it is the best solution, but it's in all my .bash* files already, and as John points out above it is documented already all over. Why don't we just stick with it?