Closed nthiery closed 10 years ago
Description changed:
---
+++
@@ -6,8 +6,7 @@
which given worksheets
-`bla.sws, bli.sws, ...` would create ReST files {{{bla.rst,
-bli.rst, ...}}} together with media directories:
+`bla.sws, bli.sws, ...` would create ReST files `bla.rst, bli.rst, ...` together with media directories:
media/bla/
@@ -24,10 +23,9 @@
sagenb-main/sagenb/notebook/
. It further depends on the
BeautifulSoup Python library (released under Python's license).
-The script builds the ReST file from the included worksheet.html file -as follow: +The script builds the ReST file from the worksheet.html file included in the .sws as follow:
-- Preparsing to handle the /
+- Preparsing to handle the input / output fields
Description changed:
---
+++
@@ -29,3 +29,4 @@
- Parsing of the resulting html using BeautifulSoup
- Manipulation on the obtained tree
+Suggestions for better file layout or implementation welcome!
Attachment: beautifulsoup-3.2.0.p0.spkg.gz
sage package for BeautifulSoup
So I have some preliminary code, but after a conversation with Vincent there is some things I have to check. For the moment, I have uploaded a sage package for BeautifulSoup, which I think is important in dealing with the html code generated by tinyMCE, and which I use in the preliminary code. It's pretty straightforward so I hope it works in other platforms. I only tested on linux 32 bits, but it's a pure python file.
patch to be applied on local/bin
Attachment: add_sws2rst.patch.gz
patch to be applied on sagenb
Attachment: tools_for_sws2rst.patch.gz
It's necessary to install the beautifulsoup spkg
A first patch adds the sage-sws2rst script, and it must be imported on the local/bin dir
A second patch sends three files to sagenb/misc and it must be imported on the devel/sagenb dir, and followed by a
sage -python setup.py install && sage -python setup.py develop
The code above is preliminary and undocumented, but I post it in order to guide the discussion. I'll write comments and make improvements when we decide on some issues.
I have to warn the task is not trivial, as html is more relaxed than rst, so that some editing is required after the conversion. The result from the script produces rst files that get compiled correctly only on occassion, but usually the modifications required are not much.
The sws file is uncompressed, and the images inside the 'data' and the 'cell/xx' dirs ar copied to a '_media' dir (this is done by the sage-sws2rst file).
Then the file worksheet.html is parsed, and split into comments, source code and output. The first is handled using the library BeautifulSoup. Source code is mostly untouched, and results are parsed to find some patterns like "image", "latex", "docstring", "other html patterns" and "plain text". The first two are displayed. The docstring, or any unrecognized html in the result cells are discarded. Plain text is indented, and is recognized as the output of the previous cell.
rest does not understand that images or latex are the output of some code cells.
docstrings are ignored. I felt stupid trying to parse some html that comes from a rst file. Maybe I can grab the rst file from the source, but only if it belongs to the library, so is it worth it?
Please add your concerns here.
I dont know if it can help but this week I found this makefile which helped me to understand how to call sphinx with which options :
devel/sage-main/doc/en/tutorial/Makefile
Attachment: add_sws2rst_2.patch.gz
replaces the first patch
Attachment: tools_sws2rst_2.patch.gz
replaces the first patch
I have tested this patch against a huge group of complex worksheets, and I'm satisfied with the result. I took into account many little things that are Sage specific or tinyMCE specific, but this new patch is still not 100% guaranteed to produce valid rst. However, it's very likely that the result will get compiled by sage -docbuild, and it's very likely that only small changes are required.
I'm still open to general suggestions.
Hi Pablo!
I just used your script for preparing my next demo from a previous worksheet. It worked smoothly, thanks so much! It would be really good to get this in; alas, I don't feel qualified to review code touching the notebook. A couple tiny comments:
Could you add the installation instructions to the ticket description? This will make Jeroen happy :-)
I had to fix the execution rights on the script:
chmod +x /opt/sage/local/bin/sage-sws2rst
Shall the advertisement of the option appear in sage -h
or
sage -advanced
. I could argue for sage -h
since this
functionality is for sage users as much as developpers.
Support for converting several sws at once would be handy.
I got this warning each time I ran the script:
/opt/sage/local/bin/sage-sws2rst:19: RuntimeWarning: tempnam is a potential security risk to your program
tempname = os.tempnam('.')
Cheers, Nicolas
nthiery, this is still reviewing! I'm adding you, even if you don't feel like you can give positive review.
Also, we'll want correct commit messages for the release manager.
It is really a shame that the work here, at tex2sws, and sws2tex are not coordinated. I'm just cc:ing some of those people so they are aware of this. Eventually, of course, those should both be part of the standard Sage, and one could refactor some of them. Well, at least of sws2tex and sws2rst.
Reviewer: Nicolas Thiéry
Ah, I forgot to mention:
What about using '----' instead of ':::::' to underline subsection titles?
It seems a bit more readable, and ':' is already used quite a lot by Sphinx; and one could imagine a very (very!) short title producing a confusing '::'.
Cheers, Nicolas
Changed reviewer from Nicolas Thiéry to Nicolas Thiéry, ...
For the record, the following tests fail with the patches applied on 4.6.2 with Error: TAB character found
sage -t "4.6.2/devel/sagenb-main/sagenb/misc/worksheet2rst.py"
sage -t "4.6.2/devel/sagenb-main/sagenb/misc/comments2rst.py"
Hello: I've addressed those issues, but I can't pass a test with a message:
Expected:
'::\n\n sage: 2+2'
Got:
'::\n\n sage: 2+2'
They look the same. They both use spaces, not tabs.
If I run the test from a sage console, it works (with ==).
any ideas?
add_sws2rst_3.patch (replaces earlier versions)
Attachment: add_sws2rst_3.patch.gz
tools_sws2rst_3.patch (replaces earlier versions)
Attachment: tools_sws2rst_3.patch.gz
Done. I also added some tests:
Description changed:
---
+++
@@ -30,3 +30,17 @@
- Manipulation on the obtained tree
Suggestions for better file layout or implementation welcome!
+
+Edit:
+## Install instructions
+
+* Install the beautifulsoup spkg
+
+* A first patch (add_sws2rst_3.patch) adds the sage-sws2rst script, and it must be imported on the local/bin dir
+
+* A second patch (tools_sws2rst_3.patch) sends three files to sagenb/misc and it must be imported on the devel/sagenb dir, and followed by a
+
+```
+ sage -python setup.py install && sage -python setup.py develop
+```
+
Done. I also added some tests:
This is great! I notice that when this is run on old notebook files, one gets the following error:
Traceback (most recent call last):
File "/usr/local/src/sage/sage-4.7.rc0/local/bin/sage-sws2rst", line 66, in <module>
process_sws(file_name)
File "/usr/local/src/sage/sage-4.7.rc0/local/bin/sage-sws2rst", line 38, in process_sws
for cell in os.listdir(cells_path):
OSError: [Errno 2] No such file or directory: '/tmp/Iterators and backtracking.sws/sage_worksheet/cells'
I think this is because there is another directory level on these old-style worksheets:
'/tmp/Iterators and backtracking.sws/sage_worksheet/<some number>/cells'
It would be nice if this error could be caught and the user given a message to the effect of "Conversion failed. Try opening your notebook with a recent version of sage, resaving it, and running the script again".
''' This post is just an attempt to close the boldface.
I followed the directions in the description to install, but got this when I tried running it:
% sage -sws2rst mysheet.sws
/Users/grout/sage/local/bin/sage-sage: line 385: /Users/grout/sage/local/bin/sage-sws2rst: Permission denied
So perhaps it needs an executable bit, or the patch needs to be exported in git format to pick that up?
My worksheet name included spaces. A few images led to this being in the text:
.. image:: PREP Quickstart, NumAnalysis_media/cell_5_sage0.png
However, the html output from running sphinx on the rst file included a link to this image:
<img alt="PREPQuickstart,NumAnalysis_media/cell_5_sage0.png" class="align-center" src="PREPQuickstart,NumAnalysis_media/cell_5_sage0.png" />
(notice the missing spaces)
I looked around and this is a bug report here: https://bitbucket.org/birkenfeld/sphinx/issue/453/spaces-get-stripped-from-image-file-names
According to http://docutils.sourceforge.net/docs/howto/rst-directives.html#image, the first thing done in the image directive is to discard all whitespace, so this script should probably discard all whitespace when making the image links.
I just converted a worksheet, and the input/output of a cell got converted so that the output was indented with a tab:
sage: spline[1]
1/2*(x - 2)*((-14.3540674845*x + 37.3205750783)*(x - 2) + 25.741626739501953) + 20.0
It seems like the output should be indented 4 spaces instead.
add_sws2rst_4.patch (replaces earlier versions)
Attachment: add_sws2rst_4.patch.gz
tools_sws2rst_43.patch (replaces earlier versions)
Description changed:
---
+++
@@ -36,9 +36,9 @@
* Install the beautifulsoup spkg
-* A first patch (add_sws2rst_3.patch) adds the sage-sws2rst script, and it must be imported on the local/bin dir
+* A first patch (add_sws2rst_4.patch) adds the sage-sws2rst script, and it must be imported on the local/bin dir
-* A second patch (tools_sws2rst_3.patch) sends three files to sagenb/misc and it must be imported on the devel/sagenb dir, and followed by a
+* A second patch (tools_sws2rst_4.patch) sends three files to sagenb/misc and it must be imported on the devel/sagenb dir, and followed by a
sage -python setup.py install && sage -python setup.py develop
Attachment: tools_sws2rst_4.patch.gz
Solved the problems mentioned by jason: thanks for testing it.
jbandlow, ¿can you send me an old worksheet file? in the old worksheets that I keep, it's rather
/tmp_dir/<some number>/cells
than
/tmp_dir/sage_worksheet/<some number>/cells
It is possible to deal with the old style directly, unless you have a reason to force the user to pass the worksheet through a newer version...
Here is another related project - sws2tex, though it uses HTMLParser
instead of Beautiful Soup
.
Replying to @kcrisman:
Here is another related project - sws2tex, though it uses
HTMLParser
instead ofBeautiful Soup
.
I know, someone told me about that project before. Though it may seem like duplicated effort, the approach is different: sws2tex converts html+css style into latex style, color by color, font by font, while sws2rst ignores all style. Still I might learn from the code, for sure.
Just checking, but has tools_sws2rst_4.patch
been ported to the new notebook at http://github.com/sagemath/sagenb ?
Setting #11080 as a dependency because the old notebook is basically abandoned now.
Dependencies: #11080
Low-priority comment: Beautiful Soup is now at version 4, though that is under the MIT license. It works with Python 2.7 and 3. The 3.2.1 maintenance release is as of last February, also not too old.
Technical comment: see the latest spkg-install instructions for a slight update on the shell commands - in particular, we should have a nonempty exit check at the end, and the SAGE_LOCAL
bit has changed slightly.
Also, you don't need to make this a p0
package, because there are no patches! :)
Less trivially, and for this 'needs work':
sage-sage
! See e.g. the 5.0 sage-sage rump. Luckily, you can probably create nearly the same patch to the new script (which is the old script) at $SAGE_ROOT/spkg/bin/sage
.So might as well upgrade Beautiful Soup and fix the other stuff too. I would like to try this over the next few weeks to finally start migrating some documentation to ReST format, so I may do all this if I get a chance.
Upstream: Reported upstream. No feedback yet.
Changed reviewer from Nicolas Thiéry, ... to Nicolas Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow
Thanks a lot for the careful comment on what needs to be updated. I'll take care soon.
Thanks a lot for the careful comment on what needs to be updated. I'll take care soon.
Great; I would much rather review this functionality so that I can use it than do all of that! Looking forward to it.
Just to encourage you, I took a look and it looks like you can really practically take the identical patch, just on this new file! Since all the other files are new, and don't seem to interact with anything else, they should be fine.
I'm even wondering whether sage-sws2rst
would need to live in Sage. This is nearly pure notebook functionality. But maybe that would be for another ticket; far more important to get this in. I may try at least doing this "by hand" next week to work on my project.
Description changed:
---
+++
@@ -34,7 +34,7 @@
Edit:
## Install instructions
-* Install the beautifulsoup spkg
+* Install the beautifulsoup spkg [here](http://sage.math.washington.edu/home/kcrisman/beautifulsoup-3.2.1.spkg)
* A first patch (add_sws2rst_4.patch) adds the sage-sws2rst script, and it must be imported on the local/bin dir
New Beautiful Soup spkg at http://sage.math.washington.edu/home/kcrisman/beautifulsoup-3.2.1.spkg this location. I'm going with the maintenance release for now because I don't want to have to think about licenses.
I'll try to figure out what to do with the other stuff soon; shouldn't be hard, but I always have trouble with the new notebook upstream business...
Rebased the patch for the sage script for the root directory. I put its advanced message in documentation because all the other sections were even worse, and it certainly can lead to documentation. I'm open to ideas about that, though.
Implement:
which given worksheets
bla.sws, bli.sws, ...
would create ReST filesbla.rst, bli.rst, ...
together with media directories:The proposed implementation adds a script
local/bin/sage-sws2rst
, editsspkg/bin/sage
to add the sws2rst option, and add some libraries insagenb-main/sagenb/notebook/
. It further depends on the BeautifulSoup Python library (released under Python's license).The script builds the ReST file from the worksheet.html file included in the .sws as follow:
Suggestions for better file layout or implementation welcome!
Install instructions
Release Manager Instructions
Depends on #14330
Upstream: Workaround found; Bug reported upstream.
CC: @nthiery @hivert @seblabbe @kcrisman @rbeezer @sagetrac-whuss @novoselt @kini @jdemeyer
Component: notebook
Keywords: ReST, worksheet
Author: Pablo Angulo, Karl-Dieter Crisman
Reviewer: Nicolas M. Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow, John Palmieri, Simon King, Karl-Dieter Crisman, Pablo Angulo
Merged: sage-5.12.beta0
Issue created by migration from https://trac.sagemath.org/ticket/10637