sagemath / sage

Main repository of SageMath. Now open for Issues and Pull Requests.
https://www.sagemath.org
Other
1.07k stars 389 forks source link

Implement sage -sws2rst #10637

Closed nthiery closed 10 years ago

nthiery commented 13 years ago

Implement:

    sage -sws2rst bla.sws bli.sws ...

which given worksheets

bla.sws, bli.sws, ... would create ReST files bla.rst, bli.rst, ... together with media directories:

media/bla/
media/bla/data/
media/bla/7/sage0.png
...
media/bli/
...

The proposed implementation adds a script local/bin/sage-sws2rst, edits spkg/bin/sage to add the sws2rst option, and add some libraries in sagenb-main/sagenb/notebook/. It further depends on the BeautifulSoup Python library (released under Python's license).

The script builds the ReST file from the worksheet.html file included in the .sws as follow:

Suggestions for better file layout or implementation welcome!

Install instructions

Release Manager Instructions

Depends on #14330

Upstream: Workaround found; Bug reported upstream.

CC: @nthiery @hivert @seblabbe @kcrisman @rbeezer @sagetrac-whuss @novoselt @kini @jdemeyer

Component: notebook

Keywords: ReST, worksheet

Author: Pablo Angulo, Karl-Dieter Crisman

Reviewer: Nicolas M. Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow, John Palmieri, Simon King, Karl-Dieter Crisman, Pablo Angulo

Merged: sage-5.12.beta0

Issue created by migration from https://trac.sagemath.org/ticket/10637

nthiery commented 13 years ago

Description changed:

--- 
+++ 
@@ -6,8 +6,7 @@

 which given worksheets

-`bla.sws, bli.sws, ...` would create ReST files {{{bla.rst,
-bli.rst, ...}}} together with media directories:
+`bla.sws, bli.sws, ...` would create ReST files `bla.rst, bli.rst, ...` together with media directories:

media/bla/ @@ -24,10 +23,9 @@ sagenb-main/sagenb/notebook/. It further depends on the BeautifulSoup Python library (released under Python's license).

-The script builds the ReST file from the included worksheet.html file -as follow: +The script builds the ReST file from the worksheet.html file included in the .sws as follow:

-- Preparsing to handle the / +- Preparsing to handle the input / output fields

nthiery commented 13 years ago

Description changed:

--- 
+++ 
@@ -29,3 +29,4 @@
 - Parsing of the resulting html using BeautifulSoup
 - Manipulation on the obtained tree

+Suggestions for better file layout or implementation welcome!
dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 13 years ago

Attachment: beautifulsoup-3.2.0.p0.spkg.gz

sage package for BeautifulSoup

dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 13 years ago
comment:3

So I have some preliminary code, but after a conversation with Vincent there is some things I have to check. For the moment, I have uploaded a sage package for BeautifulSoup, which I think is important in dealing with the html code generated by tinyMCE, and which I use in the preliminary code. It's pretty straightforward so I hope it works in other platforms. I only tested on linux 32 bits, but it's a pure python file.

dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 13 years ago

patch to be applied on local/bin

dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 13 years ago

Attachment: add_sws2rst.patch.gz

patch to be applied on sagenb

dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 13 years ago
comment:4

Attachment: tools_for_sws2rst.patch.gz

Install instructions

    sage -python setup.py install && sage -python setup.py develop

Comments

The code above is preliminary and undocumented, but I post it in order to guide the discussion. I'll write comments and make improvements when we decide on some issues.

I have to warn the task is not trivial, as html is more relaxed than rst, so that some editing is required after the conversion. The result from the script produces rst files that get compiled correctly only on occassion, but usually the modifications required are not much.

How it works

The sws file is uncompressed, and the images inside the 'data' and the 'cell/xx' dirs ar copied to a '_media' dir (this is done by the sage-sws2rst file).

Then the file worksheet.html is parsed, and split into comments, source code and output. The first is handled using the library BeautifulSoup. Source code is mostly untouched, and results are parsed to find some patterns like "image", "latex", "docstring", "other html patterns" and "plain text". The first two are displayed. The docstring, or any unrecognized html in the result cells are discarded. Plain text is indented, and is recognized as the output of the previous cell.

Issues

Please add your concerns here.

seblabbe commented 13 years ago
comment:6

I dont know if it can help but this week I found this makefile which helped me to understand how to call sphinx with which options :

devel/sage-main/doc/en/tutorial/Makefile
dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 13 years ago

Attachment: add_sws2rst_2.patch.gz

replaces the first patch

dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 13 years ago

Attachment: tools_sws2rst_2.patch.gz

replaces the first patch

dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 13 years ago
comment:7

I have tested this patch against a huge group of complex worksheets, and I'm satisfied with the result. I took into account many little things that are Sage specific or tinyMCE specific, but this new patch is still not 100% guaranteed to produce valid rst. However, it's very likely that the result will get compiled by sage -docbuild, and it's very likely that only small changes are required.

I'm still open to general suggestions.

nthiery commented 13 years ago
comment:9

Hi Pablo!

I just used your script for preparing my next demo from a previous worksheet. It worked smoothly, thanks so much! It would be really good to get this in; alas, I don't feel qualified to review code touching the notebook. A couple tiny comments:

Cheers, Nicolas

kcrisman commented 13 years ago
comment:10

nthiery, this is still reviewing! I'm adding you, even if you don't feel like you can give positive review.

Also, we'll want correct commit messages for the release manager.

It is really a shame that the work here, at tex2sws, and sws2tex are not coordinated. I'm just cc:ing some of those people so they are aware of this. Eventually, of course, those should both be part of the standard Sage, and one could refactor some of them. Well, at least of sws2tex and sws2rst.

kcrisman commented 13 years ago

Reviewer: Nicolas Thiéry

nthiery commented 13 years ago
comment:11

Ah, I forgot to mention:

What about using '----' instead of ':::::' to underline subsection titles?

It seems a bit more readable, and ':' is already used quite a lot by Sphinx; and one could imagine a very (very!) short title producing a confusing '::'.

Cheers, Nicolas

nthiery commented 13 years ago

Changed reviewer from Nicolas Thiéry to Nicolas Thiéry, ...

nthiery commented 13 years ago
comment:12

For the record, the following tests fail with the patches applied on 4.6.2 with Error: TAB character found

   sage -t  "4.6.2/devel/sagenb-main/sagenb/misc/worksheet2rst.py"
   sage -t  "4.6.2/devel/sagenb-main/sagenb/misc/comments2rst.py"
dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 13 years ago
comment:13

Hello: I've addressed those issues, but I can't pass a test with a message:

Expected:
    '::\n\n    sage: 2+2'
Got:
    '::\n\n    sage: 2+2'

They look the same. They both use spaces, not tabs.

If I run the test from a sage console, it works (with ==).

any ideas?

dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 13 years ago

add_sws2rst_3.patch (replaces earlier versions)

dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 13 years ago

Attachment: add_sws2rst_3.patch.gz

tools_sws2rst_3.patch (replaces earlier versions)

dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 13 years ago
comment:14

Attachment: tools_sws2rst_3.patch.gz

Done. I also added some tests:

dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 13 years ago

Description changed:

--- 
+++ 
@@ -30,3 +30,17 @@
 - Manipulation on the obtained tree

 Suggestions for better file layout or implementation welcome!
+
+Edit: 
+## Install instructions
+
+* Install the beautifulsoup spkg
+
+* A first patch (add_sws2rst_3.patch) adds the sage-sws2rst script, and it must be imported on the local/bin dir
+
+* A second patch (tools_sws2rst_3.patch) sends three files to sagenb/misc and it must be imported on the devel/sagenb dir, and followed by a
+
+```
+    sage -python setup.py install && sage -python setup.py develop
+```
+
dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 13 years ago
comment:15

Done. I also added some tests:

jbandlow commented 13 years ago
comment:16

This is great! I notice that when this is run on old notebook files, one gets the following error:

    Traceback (most recent call last):
  File "/usr/local/src/sage/sage-4.7.rc0/local/bin/sage-sws2rst", line 66, in <module>
    process_sws(file_name)
  File "/usr/local/src/sage/sage-4.7.rc0/local/bin/sage-sws2rst", line 38, in process_sws
    for cell in os.listdir(cells_path):
OSError: [Errno 2] No such file or directory: '/tmp/Iterators and backtracking.sws/sage_worksheet/cells'

I think this is because there is another directory level on these old-style worksheets:

   '/tmp/Iterators and backtracking.sws/sage_worksheet/<some number>/cells'

It would be nice if this error could be caught and the user given a message to the effect of "Conversion failed. Try opening your notebook with a recent version of sage, resaving it, and running the script again".

jbandlow commented 13 years ago
comment:17

''' This post is just an attempt to close the boldface.

jasongrout commented 12 years ago
comment:18

I followed the directions in the description to install, but got this when I tried running it:


% sage -sws2rst mysheet.sws
/Users/grout/sage/local/bin/sage-sage: line 385: /Users/grout/sage/local/bin/sage-sws2rst: Permission denied

So perhaps it needs an executable bit, or the patch needs to be exported in git format to pick that up?

jasongrout commented 12 years ago
comment:19

My worksheet name included spaces. A few images led to this being in the text:

.. image:: PREP Quickstart, NumAnalysis_media/cell_5_sage0.png

However, the html output from running sphinx on the rst file included a link to this image:

<img alt="PREPQuickstart,NumAnalysis_media/cell_5_sage0.png" class="align-center" src="PREPQuickstart,NumAnalysis_media/cell_5_sage0.png" /> 

(notice the missing spaces)

I looked around and this is a bug report here: https://bitbucket.org/birkenfeld/sphinx/issue/453/spaces-get-stripped-from-image-file-names

According to http://docutils.sourceforge.net/docs/howto/rst-directives.html#image, the first thing done in the image directive is to discard all whitespace, so this script should probably discard all whitespace when making the image links.

jasongrout commented 12 years ago
comment:20

I just converted a worksheet, and the input/output of a cell got converted so that the output was indented with a tab:


    sage: spline[1]
    1/2*(x - 2)*((-14.3540674845*x + 37.3205750783)*(x - 2) + 25.741626739501953) + 20.0

It seems like the output should be indented 4 spaces instead.

dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 12 years ago

add_sws2rst_4.patch (replaces earlier versions)

dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 12 years ago

Attachment: add_sws2rst_4.patch.gz

tools_sws2rst_43.patch (replaces earlier versions)

dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 12 years ago

Description changed:

--- 
+++ 
@@ -36,9 +36,9 @@

 * Install the beautifulsoup spkg

-* A first patch (add_sws2rst_3.patch) adds the sage-sws2rst script, and it must be imported on the local/bin dir
+* A first patch (add_sws2rst_4.patch) adds the sage-sws2rst script, and it must be imported on the local/bin dir

-* A second patch (tools_sws2rst_3.patch) sends three files to sagenb/misc and it must be imported on the devel/sagenb dir, and followed by a
+* A second patch (tools_sws2rst_4.patch) sends three files to sagenb/misc and it must be imported on the devel/sagenb dir, and followed by a
 sage -python setup.py install && sage -python setup.py develop
dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 12 years ago
comment:21

Attachment: tools_sws2rst_4.patch.gz

dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 12 years ago
comment:22

Solved the problems mentioned by jason: thanks for testing it.

dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 12 years ago
comment:23

jbandlow, ¿can you send me an old worksheet file? in the old worksheets that I keep, it's rather

/tmp_dir/<some number>/cells

than

/tmp_dir/sage_worksheet/<some number>/cells

It is possible to deal with the old style directly, unless you have a reason to force the user to pass the worksheet through a newer version...

kcrisman commented 12 years ago
comment:24

Here is another related project - sws2tex, though it uses HTMLParser instead of Beautiful Soup.

dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 12 years ago
comment:25

Replying to @kcrisman:

Here is another related project - sws2tex, though it uses HTMLParser instead of Beautiful Soup.

I know, someone told me about that project before. Though it may seem like duplicated effort, the approach is different: sws2tex converts html+css style into latex style, color by color, font by font, while sws2rst ignores all style. Still I might learn from the code, for sure.

kini commented 11 years ago
comment:26

Just checking, but has tools_sws2rst_4.patch been ported to the new notebook at http://github.com/sagemath/sagenb ?

kini commented 11 years ago
comment:27

Setting #11080 as a dependency because the old notebook is basically abandoned now.

kini commented 11 years ago

Dependencies: #11080

kcrisman commented 11 years ago
comment:28

Low-priority comment: Beautiful Soup is now at version 4, though that is under the MIT license. It works with Python 2.7 and 3. The 3.2.1 maintenance release is as of last February, also not too old.

kcrisman commented 11 years ago
comment:29

Technical comment: see the latest spkg-install instructions for a slight update on the shell commands - in particular, we should have a nonempty exit check at the end, and the SAGE_LOCAL bit has changed slightly.

Also, you don't need to make this a p0 package, because there are no patches! :)

kcrisman commented 11 years ago
comment:30

Less trivially, and for this 'needs work':

So might as well upgrade Beautiful Soup and fix the other stuff too. I would like to try this over the next few weeks to finally start migrating some documentation to ReST format, so I may do all this if I get a chance.

kcrisman commented 11 years ago

Upstream: Reported upstream. No feedback yet.

kcrisman commented 11 years ago

Changed reviewer from Nicolas Thiéry, ... to Nicolas Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow

dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 11 years ago
comment:31

Thanks a lot for the careful comment on what needs to be updated. I'll take care soon.

kcrisman commented 11 years ago
comment:32

Thanks a lot for the careful comment on what needs to be updated. I'll take care soon.

Great; I would much rather review this functionality so that I can use it than do all of that! Looking forward to it.

kcrisman commented 11 years ago
comment:33

Just to encourage you, I took a look and it looks like you can really practically take the identical patch, just on this new file! Since all the other files are new, and don't seem to interact with anything else, they should be fine.

I'm even wondering whether sage-sws2rst would need to live in Sage. This is nearly pure notebook functionality. But maybe that would be for another ticket; far more important to get this in. I may try at least doing this "by hand" next week to work on my project.

kcrisman commented 11 years ago

Description changed:

--- 
+++ 
@@ -34,7 +34,7 @@
 Edit: 
 ## Install instructions

-* Install the beautifulsoup spkg
+* Install the beautifulsoup spkg [here](http://sage.math.washington.edu/home/kcrisman/beautifulsoup-3.2.1.spkg)

 * A first patch (add_sws2rst_4.patch) adds the sage-sws2rst script, and it must be imported on the local/bin dir
kcrisman commented 11 years ago
comment:34

New Beautiful Soup spkg at http://sage.math.washington.edu/home/kcrisman/beautifulsoup-3.2.1.spkg this location. I'm going with the maintenance release for now because I don't want to have to think about licenses.

I'll try to figure out what to do with the other stuff soon; shouldn't be hard, but I always have trouble with the new notebook upstream business...

kcrisman commented 11 years ago
comment:35

Rebased the patch for the sage script for the root directory. I put its advanced message in documentation because all the other sections were even worse, and it certainly can lead to documentation. I'm open to ideas about that, though.