sagemath / sage

Main repository of SageMath. Now open for Issues and Pull Requests.
https://www.sagemath.org
Other
1.09k stars 394 forks source link

Implement sage -sws2rst #10637

Closed nthiery closed 10 years ago

nthiery commented 13 years ago

Implement:

    sage -sws2rst bla.sws bli.sws ...

which given worksheets

bla.sws, bli.sws, ... would create ReST files bla.rst, bli.rst, ... together with media directories:

media/bla/
media/bla/data/
media/bla/7/sage0.png
...
media/bli/
...

The proposed implementation adds a script local/bin/sage-sws2rst, edits spkg/bin/sage to add the sws2rst option, and add some libraries in sagenb-main/sagenb/notebook/. It further depends on the BeautifulSoup Python library (released under Python's license).

The script builds the ReST file from the worksheet.html file included in the .sws as follow:

Suggestions for better file layout or implementation welcome!

Install instructions

Release Manager Instructions

Depends on #14330

Upstream: Workaround found; Bug reported upstream.

CC: @nthiery @hivert @seblabbe @kcrisman @rbeezer @sagetrac-whuss @novoselt @kini @jdemeyer

Component: notebook

Keywords: ReST, worksheet

Author: Pablo Angulo, Karl-Dieter Crisman

Reviewer: Nicolas M. Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow, John Palmieri, Simon King, Karl-Dieter Crisman, Pablo Angulo

Merged: sage-5.12.beta0

Issue created by migration from https://trac.sagemath.org/ticket/10637

kcrisman commented 11 years ago

Apply to local/bin

kcrisman commented 11 years ago

Description changed:

--- 
+++ 
@@ -31,14 +31,13 @@

 Suggestions for better file layout or implementation welcome!

-Edit: 
 ## Install instructions

 * Install the beautifulsoup spkg [here](http://sage.math.washington.edu/home/kcrisman/beautifulsoup-3.2.1.spkg)
+* Apply [attachment: trac_10637-root.patch](https://github.com/sagemath/sage-prod/files/10651912/trac_10637-root.patch.gz) to the root repository
+* Apply [attachment: trac_10637-scripts.patc](https://github.com/sagemath/sage/files/ticket10637/trac_10637-scripts.patc.gz) to the scripts repository

-* A first patch (add_sws2rst_4.patch) adds the sage-sws2rst script, and it must be imported on the local/bin dir
-
-* A second patch (tools_sws2rst_4.patch) sends three files to sagenb/misc and it must be imported on the devel/sagenb dir, and followed by a
+* Apply [attachment: tools_sws2rst_4.patch](https://github.com/sagemath/sage-prod/files/10651905/tools_sws2rst_4.patch.gz) to sagenb, possibly followed by
 sage -python setup.py install && sage -python setup.py develop
kcrisman commented 11 years ago
comment:36

Attachment: trac_10637-scripts.patch.gz

kcrisman commented 11 years ago

Changed reviewer from Nicolas Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow to Nicolas Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow, Karl-Dieter Crisman

kcrisman commented 11 years ago
comment:37

Okay, at this point one could just review this as-is. Naturally, the sagenb stuff needs to be applied upstream, but these instructions would work. I'm not sure what the whole setup.py business is about, but I guess it's worth doing? I don't know if you'd have to be in the sagenb directory for that...

    sage -python setup.py install && sage -python setup.py develop
kcrisman commented 11 years ago

Description changed:

--- 
+++ 
@@ -35,7 +35,7 @@

 * Install the beautifulsoup spkg [here](http://sage.math.washington.edu/home/kcrisman/beautifulsoup-3.2.1.spkg)
 * Apply [attachment: trac_10637-root.patch](https://github.com/sagemath/sage-prod/files/10651912/trac_10637-root.patch.gz) to the root repository
-* Apply [attachment: trac_10637-scripts.patc](https://github.com/sagemath/sage/files/ticket10637/trac_10637-scripts.patc.gz) to the scripts repository
+* Apply [attachment: trac_10637-scripts.patch](https://github.com/sagemath/sage-prod/files/10651906/trac_10637-scripts.patch.gz) to the scripts repository

 * Apply [attachment: tools_sws2rst_4.patch](https://github.com/sagemath/sage-prod/files/10651905/tools_sws2rst_4.patch.gz) to sagenb, possibly followed by
kcrisman commented 11 years ago
comment:38

Pablo, why do these files need to be in the notebook? I'm a little confused about this. They don't seem to use anything in the notebook. Couldn't one just have them be in some other directory in the main Sage library? I do understand that they are about the notebook, so maybe it would be best to have them there, but then it's weird that the script isn't also in the notebook.

Anyway, that's a side issue, I guess.

kcrisman commented 11 years ago
comment:39

Okay, I tried this, but got

$ ../../sage -sws2rst ~/Downloads/Test4sws2rst.sws 
<snip>
  File "/Users/.../sage-5.1.beta1-flask/local/lib/python/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 20] Not a directory: '/Users/.../Test4sws2rst.sws/sage_worksheet'

which might make sense based on comment:23, except this was a brand-new worksheet. Maybe I'm doing something wrong in using it?

The specific place in sage-sws2rst that breaks is

sws_file.extractall(tempname)

and the rest the Python this calls.

Here is the problem, in pure Python

>>> tempname = os.path.join(tempfile.gettempdir(), "/Users/.../Test4sws2rst.sws")
>>> tempname
'/Users/.../Test4sws2rst.sws'
>>> tempfile.gettempdir()
'/var/folders/Yy/YytEJm5VEB0+pBRD7JNLe++++TQ/-Tmp-'

Why isn't this working right? Ah, looking at the help tells me...

If any component is an absolute path, all previous path components
    will be discarded.

So apparently one can only do an extraction from the same directory (../ gives a similar problem).

But then!


$ ./sage -sws2rst Test4sws2rst.sws 
Test4sws2rst.sws
Traceback (most recent call last):
  File "/Users/.../sage-5.1.beta1-flask/local/bin/sage-sws2rst", line 66, in <module>
    process_sws(file_name)
  File "/Users/.../sage-5.1.beta1-flask/local/bin/sage-sws2rst", line 39, in process_sws
    for cell in os.listdir(cells_path):
OSError: [Errno 2] No such file or directory: '/var/folders/Yy/YytEJm5VEB0+pBRD7JNLe++++TQ/-Tmp-/Test4sws2rst.sws/sage_worksheet/cells'

And now I nearly give up. Isn't the point exactly to create such things? I'm doing something wrong here, or maybe I rebased the patches wrong? I don't think so...

Here we go.

$ ls /var/folders/Yy/YytEJm5VEB0+pBRD7JNLe++++TQ/-Tmp-/Test4sws2rst.sws/sage_worksheet/
worksheet.html      worksheet.txt       worksheet_conf.pickle

No cells. I wonder why? worksheet.html does have stuff in it. Now I really am stumped.

kcrisman commented 11 years ago
comment:40

This needs work - the sage -advanced list now has a file conversion category, and I didn't base this on the most recent version of Sage. Patch coming.

kcrisman commented 11 years ago
comment:41

Okay, I put it there, and also removed it from the non-advanced list, since the rst2sws and rst2txt were not in the normal list.

kcrisman commented 11 years ago

Changed author from Pablo Angulo to Pablo Angulo, Karl-Dieter Crisman

kcrisman commented 11 years ago

Changed reviewer from Nicolas Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow, Karl-Dieter Crisman to Nicolas Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow

kcrisman commented 11 years ago
comment:42

See this pull request at sagenb for the latest upstream report.

kcrisman commented 11 years ago

Changed upstream from Reported upstream. No feedback yet. to Workaround found; Bug reported upstream.

kini commented 11 years ago
comment:43

I wonder why you are doing this conversion from scratch instead of using the docutils syntax tree. We already ship docutils - doesn't it have a way to convert HTML to rST? Why do we need BeautifulSoup?

kcrisman commented 11 years ago
comment:44

I wonder why you are doing this conversion from scratch instead of using the docutils syntax tree. We already ship docutils - doesn't it have a way to convert HTML to rST? Why do we need BeautifulSoup?

I'm sure that's a good question, but maybe that would be an improvement for later...

kcrisman commented 11 years ago
comment:45

WOW!!! This is really awesome! I did a moderately complicated worksheet just now, and had only one error message.

: WARNING: Bullet list ends without a blank line; unexpected unindent.

This was in a sublist. Here is the html source in sage_worksheet.

<p>This works because Sage is very collaborative. &nbsp;</p>
<ul>
<li>Anyone who has access to the server can be invited to share a document.</li>
<li>In this case, the teacher would post a lab, and the students could share a copy of it with each other to work on it.</li>
<li>Then, when they are done, they can:
<ul>
<li>Publish it to the server for the teacher to view, or&nbsp;</li>
<li>Share it with the teacher, or&nbsp;</li>
<li>Download it to be uploaded by the teacher...</li>
</ul>
</li>
</ul>

I don't know whether this is 'needs work' or for a future ticket, since it still looks right in the built Sphinx output.

Also, along the same priority lines, if you include jpg images in the data directory, I guess they don't always look quite right - that is, they don't scale with the size of the picture. I guess that is just because they are jpg files, not png - probably there is no real solution to this. The graphics from Sage do fine.


More importantly, I think that there we need to add some documentation as to how to use this somewhere. What do you think? I think that the Sphinx tutorial in conjunction with the following would do it.

Here's another thing. As far as I can tell, one can pretty much only run this on an .sws that is in the directory you live in. So we need to document this in the same place.


So here are the issues for 'needs work'.

kini commented 11 years ago
comment:46

I'm sure that's a good question, but maybe that would be an improvement for later...

So, we add BeautifulSoup as a dependency for sagenb in this ticket, and then deem removing it again a followup? That's a bit... odd...

  • Make sage -sws2rst fail gracefully if one hasn't installed the beautifulsoup spkg. Naturally, this requires beautifulsoup to be an optional or at least experimental spkg. Since it's just a Python thing, that shouldn't be a problem.

That's more like it :)

seblabbe commented 11 years ago
comment:47

Replying to @kini:

I wonder why you are doing this conversion from scratch instead of using the docutils syntax tree. We already ship docutils - doesn't it have a way to convert HTML to rST? Why do we need BeautifulSoup?

As far as I know, docutils reads rst only. It translates this into plenty of format (odt, .html, .tex, s5, etc.) but not the other way around. To implement sws2rst, you need some html reader like beautiful soup.

Sébastien

kcrisman commented 11 years ago
comment:48

With respect to this worksheet with no cells, it gets weirder. As soon as I downloaded it from sagenb.org and uploaded it locally, then downloaded again, it had everything and worked fine. Continuing to test.

kcrisman commented 11 years ago
comment:49

Okay, I can reproduce this.

Probably something like "delete all output" could also make this happen. I don't know why a worksheet wouldn't have cells in other cases, but at any rate this can happen under "natural" circumstances as well as my weird one.


By the way, the stuff generated is not perfect, mainly because many of us use TinyMCE header styles to do larger fonts - just easier than trying to figure out how to make larger text and center it. So semantics and actual use don't coincide. But I figure if someone wanted to turn this into a rst file then they would know that might happen.

kcrisman commented 11 years ago

Work Issues: fail nicely if no beautifulsoup, add doc in sage-sws2rst, case with no cells/ directory

kcrisman commented 11 years ago
comment:50

At #11459, slabbe points out that sage-rst2sws has pretty good docs right in the script itself that print out with sage -sws2rst -h. That would suffice for the purposes here.

kcrisman commented 11 years ago
comment:51

So I changed enough to add a second patch for clarity of what is what. I have now plenty of documentation with sage -sws2rst -h and then sage -sws2rst --sphinxify (which is mentioned in the help message.

Also, it turns out that this already does fail nicely if there is no BeautifulSoup, except it recommends to download from this Trac ticket. Wow, an entire spkg attached to a ticket! So that needs to be fixed, but that is only a tiny change. Patch coming up there too.

kcrisman commented 11 years ago

Attachment: trac_10637-sagenb-reviewer.patch.gz

Apply to sagenb

kcrisman commented 11 years ago

Attachment: trac_10637-root-docsandmore.patch.gz

Apply to root repository

kcrisman commented 11 years ago

Changed work issues from fail nicely if no beautifulsoup, add doc in sage-sws2rst, case with no cells/ directory to none

kcrisman commented 11 years ago
comment:52

To patchbot and others, instructions:

    sage -python setup.py install && sage -python setup.py develop

Okay, this should take care of all the work issues. I've made enough changes in the sage-sws2rst file that it definitely needs review, whether from pang or slabbe or someone else. Please try especially to break it with weird input; but everything else should really be okay, given that my changes to Pablo's great core work is very minimal.

Note that not only my review patches need review, but also the original sagenb patches as well. The sagenb stuff is up-to-date in the pull request 75 of sagenb.

I'm sure there are more elegant ways to do it, but the original scripts patch is fine.

I suppose someone should also review the spkg, though there is really almost nothing to review other than bringing it up to developer guide guidelines. See this sage-devel thread for a vote about whether this is allowed to be an optional spkg.

kcrisman commented 11 years ago

Description changed:

--- 
+++ 
@@ -34,10 +34,10 @@
 ## Install instructions

 * Install the beautifulsoup spkg [here](http://sage.math.washington.edu/home/kcrisman/beautifulsoup-3.2.1.spkg)
-* Apply [attachment: trac_10637-root.patch](https://github.com/sagemath/sage-prod/files/10651912/trac_10637-root.patch.gz) to the root repository
+* Apply [attachment: trac_10637-root.patch](https://github.com/sagemath/sage-prod/files/10651912/trac_10637-root.patch.gz) and [attachment: trac_10637-root-docsandmore.patch](https://github.com/sagemath/sage-prod/files/10651908/trac_10637-root-docsandmore.patch.gz) to the root repository
 * Apply [attachment: trac_10637-scripts.patch](https://github.com/sagemath/sage-prod/files/10651906/trac_10637-scripts.patch.gz) to the scripts repository

-* Apply [attachment: tools_sws2rst_4.patch](https://github.com/sagemath/sage-prod/files/10651905/tools_sws2rst_4.patch.gz) to sagenb, possibly followed by
+* Apply [attachment: tools_sws2rst_4.patch](https://github.com/sagemath/sage-prod/files/10651905/tools_sws2rst_4.patch.gz) and [attachment: trac_10637-sagenb-reviewer.patch](https://github.com/sagemath/sage-prod/files/10651907/trac_10637-sagenb-reviewer.patch.gz) to sagenb, possibly followed by
 sage -python setup.py install && sage -python setup.py develop
kcrisman commented 11 years ago
comment:53

I suppose someone should also review the spkg, though there is really almost nothing to review other than bringing it up to developer guide guidelines. See this sage-devel thread for a vote about whether this is allowed to be an optional spkg.

No response, but given that this is available via both pip and easy_install, I say it should be optional at the end of this.

Now we just need review of the review patches, and the sagenb patches.

dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 11 years ago
comment:54

Thanks a lot for you care Karl.

I downloaded a source version of sage 5.0.1, and tried to:

"Apply trac_10637-root.patch to the root repository", with the result:

patching file spkg/bin/sage
Hunk #1 FAILED at 189
Hunk #2 succeeded at 397 (offset -23 lines).
1 out of 2 hunks FAILED -- saving rejects to file spkg/bin/sage.rej
patch failed to apply
spkg/bin/sage
patch failed, rejects left in working dir
errors during apply, please fix and refresh trac_10637-root.patch

where spkg/bin/sage.rej reads:

--- sage
+++ sage
@@ -190,6 +190,8 @@
     echo "                         reStructuredText source."
     echo "  -rst2sws [...]      -- Generates Sage worksheet (.sws) from standalone"
     echo "                         reStructuredText source."
+    echo "  -sws2rst <sws doc>  -- Generates a reStructuredText source file from"
+    echo "                         a Sage worksheet (.sws) document."

     echo
     ####  1.......................26..................................................78

What am I dong wrong?

kcrisman commented 11 years ago

Changed dependencies from #11080 to #11080, #11459

kcrisman commented 11 years ago
comment:55

I based it on a later beta because of the model from rst2sws - this depends on #11459, I guess, sorry for not making that more explicit.

dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 11 years ago
comment:56

ok, now

applying trac_10637-root.patch
now at: trac_10637-root.patch

works, but the next patch fails

applying trac_10637-root-docsandmore.patch
unable to find 'sage-sws2rst' for patching
5 out of 5 hunks FAILED -- saving rejects to file sage-sws2rst.rej
patch failed, unable to continue (try -v)
patch failed, rejects left in working dir
errors during apply, please fix and refresh trac_10637-root-docsandmore.patch

I attach sage-sws2rst.rej...

kcrisman commented 11 years ago
comment:57

Attachment: sage-sws2rst.rej.gz

I'm stupid. That patch, despite the name, applies to the scripts (local/bin) repository. Sorry; updating instructions here and in the description.


To patchbot and others, instructions:

    sage -python setup.py install && sage -python setup.py develop
kcrisman commented 11 years ago

Description changed:

--- 
+++ 
@@ -33,9 +33,10 @@

 ## Install instructions

+* Be sure you have a recent beta, and/or that #11080 and #11459 are included.
 * Install the beautifulsoup spkg [here](http://sage.math.washington.edu/home/kcrisman/beautifulsoup-3.2.1.spkg)
-* Apply [attachment: trac_10637-root.patch](https://github.com/sagemath/sage-prod/files/10651912/trac_10637-root.patch.gz) and [attachment: trac_10637-root-docsandmore.patch](https://github.com/sagemath/sage-prod/files/10651908/trac_10637-root-docsandmore.patch.gz) to the root repository
-* Apply [attachment: trac_10637-scripts.patch](https://github.com/sagemath/sage-prod/files/10651906/trac_10637-scripts.patch.gz) to the scripts repository
+* Apply [attachment: trac_10637-root.patch](https://github.com/sagemath/sage-prod/files/10651912/trac_10637-root.patch.gz) to the root repository
+* Apply [attachment: trac_10637-scripts.patch](https://github.com/sagemath/sage-prod/files/10651906/trac_10637-scripts.patch.gz) and [attachment: trac_10637-root-docsandmore.patch](https://github.com/sagemath/sage-prod/files/10651908/trac_10637-root-docsandmore.patch.gz) to the scripts repository

 * Apply [attachment: tools_sws2rst_4.patch](https://github.com/sagemath/sage-prod/files/10651905/tools_sws2rst_4.patch.gz) and [attachment: trac_10637-sagenb-reviewer.patch](https://github.com/sagemath/sage-prod/files/10651907/trac_10637-sagenb-reviewer.patch.gz) to sagenb, possibly followed by
kcrisman commented 11 years ago
comment:58

pang commented elsewhere:

Hi, karl: It's working now, I just need to doctest the full sage library and stuff like that. Currently my hardware is limited, so I cannot do this at any time.

You may want to get an account on the sage.math cluster.

Anyway, I wouldn't worry too much about doctests outside of sagenb, since this doesn't actually change the Sage library at all! The changes are in the actual script, and so checking that it still works the same as you intended and has correct logic is the most important thing.

Once that was true, all that would remain would be for me or someone else other than you to review the actual sagenb stuff you originally included in attachment: tools_sws2rst_4.patch. I may eventually be able to do so, but such parsing is slow going for me. I wonder if one of the folks from sage-combinat might be recruited to do so? On the plus side, there is very little chance of bitrot on the sagenb end, and fixing it on the Sage end should be relatively straightforward.

kcrisman commented 11 years ago
comment:59

Okay, I've read through the code, and here is everything I have to say without actually trying it out on a variety of worksheets. Anyone have any comments on my patch to the actual script?


To me, the main issue is that the code needs to be fairly well-formed. Is worksheet.html really always that well-formed of HTML? I just don't know.

The reasons this doesn't concern me too much are

  1. The worst that could happen is that the rst doesn't look good, but nothing gets destroyed
  2. Presumably someone wouldn't bother to use this functionality in the first place without checking that the worksheet at least looked nice
  3. Presumably BeautifulSoup makes the html more well-formed

Places this could go wrong, maybe, with weird input, below. Keep in mind I'm not at all a regex wizard, so that could be part of my questions. I'd appreciate any responses to whether these could be problems; although most of them wouldn't be a big issue, I still feel like especially the first several really require (for me) explanation.

Well, that's a lot of dumb little questions. On the very positive plus side, it looks like this is at least as good, if not much better, than other html2rst things on the web - one even advertises "no support for tables, don't even try". I like the infrastructure for extending this should there be call for more to go, and the use of ints but via the names. Great work!

dea4e0c2-f018-471f-92a6-9a4fde6d1a01 commented 11 years ago
comment:60

Thanks again, Karl. I will address most comments later, after some experiment, but I'd love to hear your opinion on one of them. You said:

Replying to @kcrisman:

  • In results sections, is all html being removed? Is there a reason for this? I assume that this is to take care of a specific type of result that can occur that I can't think of right now. After all, a Sage cell is a Python cell, and html certainly could be legitimate output.

What should I do? I don't feel like parsing it into rst, because it's output, and parsing is lossy. I cannot be too literal either, because I have to leave something that works when the rst files are rendered into html, tex or whatever.

The question is related to others, like the one on tracebacks for example. I didn't like long tracebacks in the docs, which would scare my non-technical prospective readers. Should I rewrite sws2rst to incorporate several config options, like long_tracebacks or parse_html_in_results or replace_monospace_font_by_code_tags? Can you think of others?

kcrisman commented 11 years ago
comment:61
  • In results sections, is all html being removed? Is there a reason for this? I assume that this is to take care of a specific type of result that can occur that I can't think of right now. After all, a Sage cell is a Python cell, and html certainly could be legitimate output.

Here's an example - @interact output is html. In sws2tex, whuss and friends just completely ignore them - perhaps not the worst idea, given that it's supposed to be interactive and a tex (or rst) document isn't, by definition.

What should I do? I don't feel like parsing it into rst, because it's output, and parsing is lossy. I cannot be too literal either, because I have to leave something that works when the rst files are rendered into html, tex or whatever.

I suppose one could start the tree all over again? But that seems less than ideal. Would it be possible to place the html in code tags? I am wondering what some of the typical examples you ran across were. Certainly we can't (or at least don't want to, maybe rest supports this) support all the style stuff html would support.

The question is related to others, like the one on tracebacks for example. I didn't like long tracebacks in the docs, which would scare my non-technical prospective readers. Should I rewrite sws2rst to incorporate several config options, like long_tracebacks or parse_html_in_results or replace_monospace_font_by_code_tags? Can you think of others?

Maybe in another ticket, in general. The html one I was a little unsure, since html output could be legitimate. I do think that the monospace (Courier, I guess?) by code shouldn't be the default, just because that seems to be your own personal trick, but conceivably others might just want text, and might be surprised if all text that was in lots of TinyMCE fonts looks the same, except the text in that particular font. I wish we had a way to turn the different font sizes into different rest...

Anyway, it's clear that there are no major changes required here. I'm going to start testing the same set of worksheets Jason tried last year now, since I'm hoping to convert them anyway, and see what happens. Let's hope this can get in by Sage 5.3!

kcrisman commented 11 years ago

Apply to scripts

kcrisman commented 11 years ago
comment:62

Attachment: trac_10637-scripts-docsandmore.patch.gz

Made a very minor change to the sphinxify help message, so I renamed that patch so it makes more sense now.

kcrisman commented 11 years ago

Description changed:

--- 
+++ 
@@ -36,7 +36,7 @@
 * Be sure you have a recent beta, and/or that #11080 and #11459 are included.
 * Install the beautifulsoup spkg [here](http://sage.math.washington.edu/home/kcrisman/beautifulsoup-3.2.1.spkg)
 * Apply [attachment: trac_10637-root.patch](https://github.com/sagemath/sage-prod/files/10651912/trac_10637-root.patch.gz) to the root repository
-* Apply [attachment: trac_10637-scripts.patch](https://github.com/sagemath/sage-prod/files/10651906/trac_10637-scripts.patch.gz) and [attachment: trac_10637-root-docsandmore.patch](https://github.com/sagemath/sage-prod/files/10651908/trac_10637-root-docsandmore.patch.gz) to the scripts repository
+* Apply [attachment: trac_10637-scripts.patch](https://github.com/sagemath/sage-prod/files/10651906/trac_10637-scripts.patch.gz) and [attachment: trac_10637-scripts-docsandmore.patch](https://github.com/sagemath/sage-prod/files/10651910/trac_10637-scripts-docsandmore.patch.gz) to the scripts repository

 * Apply [attachment: tools_sws2rst_4.patch](https://github.com/sagemath/sage-prod/files/10651905/tools_sws2rst_4.patch.gz) and [attachment: trac_10637-sagenb-reviewer.patch](https://github.com/sagemath/sage-prod/files/10651907/trac_10637-sagenb-reviewer.patch.gz) to sagenb, possibly followed by
kcrisman commented 11 years ago
comment:63

Okay, here are some comments from a first try of a bunch of worksheets. It is a long list! So before I get to it, I want to say that in general, this is working great; everything that doesn't work is easily fixable in the rst source file by a very much non-expert in ReST like myself. If some of these things are fixable - particularly the extra newline needed in lists, and the issue with math formatting - that would be best before this makes it into Sage, but most of the rest should be in an upgrade. Great work!


<a href="#Main">below</a>

and then later

<p id="Main">The main part of ...

The ReST is

please skip  `below <#Main>`_ .

I don't know if that is easy to fix, but anyway it's not perfect. Again, given the goals of this, that could be in a second version, but I wanted to point it out. Maybe if the text is #foo it could be prepended somehow? But of course the ids are mostly gone, I guess.

Plot a green <span style="color: #008000;">$y=\sin(x)$</span> together with...

and then it made a new line there. I'm wondering why you chose to make \n instead of just a space - surely you encountered some "real-life" examples where that was the better option. I'm assuming we can't easily take the color info in, if that's the only info.

<p>For a typical mathematical function, it's pretty straightforward to define it. &nbsp;Below, we define the function $$f(x)=x^2\; .$$</p>

which became

For a typical mathematical function, it's pretty straightforward to define it.  Below, we define the function
.. MATH::

    f(x)=x^2\; .

which typeset with the .. MATH: as part of the paragraph, interpreting the double colon as making it an input cell (which it then looked like. I'm not sure what happened here, but I'm guessing it has something to do with the removal of the paragraphs with double dollar signs.

<ul>
<li>For instance, it isn't too hard to add things like $$\zeta(s)=\sum_{n=1}^{\infty}\frac{1}{n^s}=\prod_p \left(\frac{1}{1-p^{-s}}\right)\; .$$&nbsp;</li>
<li>One just types things like&nbsp;"\$\$\zeta(s)=\sum_{n=1}^{\infty}\frac{1}{n^s}=\prod_p \left(\frac{1}{1-p^{-s}}\right)\$\$" in the word processor. &nbsp;</li>
<li>Whether this shows up as nicely as possible depends on what fonts you have in your browser, but it should be legible. &nbsp;</li>
<li>More realistically, we might type "\$f(x)=x^2\$" so that we remember that $f(x)=x^2$ in this worksheet.</li>
</ul>

became all math mode, because the indentation

  - For instance, it isn't too hard to add things like
.. MATH::

    \zeta(s)=\sum_{n=1}^{\infty}\frac{1}{n^s}=\prod_p \left(\frac{1}{1-p^{-s}}\right)\; .

  - One just types things like "\$\$\zeta(s)=\sum_{n=1}^{\infty}\frac{1}{n^s}=\prod_p \left(\frac{1}{1\-p^{\-s}}\right)\$\$$

  - Whether this shows up as nicely as possible depends on what fonts you have in your browser, but it should be legible.

all was interpreted as being part of the math block! Instead, the math block should have been inside the list, I guess?

arrow $\mapsto$ as "|--&gt;").

becomes the same thing, so it doesn't turn back into |-->, the arrow! Not sure what to do about that; why didn't the greater than sign just become a greater than sign when translated from html back? Also, I thought that | was one of the characters you escaped, but maybe this one escaped being escaped?

WARNING: Inline substitution_reference start-string without end-string.

which I assume is related because of the open pipe | without a close pipe.

the  `Sage website <http://www.sagemath.org/help.html>`_ , which

Note the space before the comma, which shows up in the built doc as well.

 - Sage uses the Python programming language, which uses this syntax, 'under the hood', and

 - Because it makes it easier to distinguish among
  - The mathematical object,

yields

WARNING: Bullet list ends without a blank line; unexpected unindent.

pretty consistently. This should be fixed, and probably isn't too hard to do by adding an extra \n somewhere in the ul code. I would say that 90% of the Sphinx errors I got were this.

WARNING: Block quote ends without a blank line; unexpected unindent.

This comes from turning things like

<p>The most obvious one is simply turning $$\int f(x)dx$$ into $$\int_a^b f(x)dx\; ,$$ as indicated in the Calculus I section. &nbsp;</p>

into

The most obvious one is simply turning
.. MATH::

    \int f(x)dx

  into
.. MATH::

    \int_a^b f(x)dx\; ,

  as indicated in the Calculus I section.

Not only is there the issue with the reinterpretation of the math block as a literal block, but it interprets the extra words as part of the literal block - but only because they for some reason got extra spaces.

- item one

 - item 1a
 - item 1b

- item two

 - item 2a
 -item 2b

more correct than

 - item one
  - item 1a
  - item 1b

 - item two
  - item 2a
  -item 2b

which you currently have? If you look at rendered worksheets, you'll see what I mean - everything is just a little too far in.

kcrisman commented 11 years ago
comment:64

And as a followup, doing another group of simpler worksheets yielded only three bullet list errors and a few other things. I did have a couple MathJax issues, but I'm pretty sure that's what they are, nothing to do with the patches here.

jhpalmieri commented 11 years ago
comment:65

Maybe a patch like the following would allow passing absolute path names as arguments:

diff --git a/sage-sws2rst b/sage-sws2rst
--- a/sage-sws2rst
+++ b/sage-sws2rst
@@ -51,10 +51,10 @@ def process_sws(file_name):
     #TODO: python complains about using tempnam, but I don't
     #know hot to fix it or see any danger
 #    tempname = os.tempnam('.')
-    tempname = os.path.join(tempfile.gettempdir(), file_name)
-    sws_file.extractall(tempname)
     base_name = os.path.split(os.path.splitext(file_name)[0])[1]
     base_name_clean = base_name.replace(' ','_')
+    tempname = os.path.join(tempfile.gettempdir(), base_name_clean)
+    sws_file.extractall(tempname)

     #Images
     images_dir = base_name_clean + '_media'

This needs serious testing, but being unable to pass absolute paths is a show-stopper for me. By the way, you could also use os.path.basename instead of os.path.split to define base_name.

kcrisman commented 11 years ago

Changed reviewer from Nicolas Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow to Nicolas Thiéry, Jason Grout, Karl-Dieter Crisman, Jason Bandlow, John Palmieri

kcrisman commented 11 years ago

Work Issues: answer questions, math formatting, lists, maybe absolute paths?

kcrisman commented 11 years ago
comment:66

Okay, after thinking about it some more, here is what would be absolutely required for positive review.

Humbly disagreeing with jhpalmieri about the absolute paths, since I documented it - but of course that would be great, what magic makes this work but not the other version? If this does in fact work, then I guess that would be good too.

The point is that we don't want to make the review process overly long here.

kcrisman commented 11 years ago

Description changed:

--- 
+++ 
@@ -18,7 +18,7 @@

The proposed implementation adds a script -local/bin/sage-sws2rst, edits local/bin/sage-sage to add +local/bin/sage-sws2rst, edits spkg/bin/sage to add the sws2rst option, and add some libraries in sagenb-main/sagenb/notebook/. It further depends on the BeautifulSoup Python library (released under Python's license).