schemedoc / cookbook

New Scheme Cookbook
https://cookbook.scheme.org
29 stars 3 forks source link

Code mark in md files are still in the published html file #26

Closed APIPLM closed 2 years ago

APIPLM commented 3 years ago

Like https://cookbook.scheme.org/creating-subset-of-an-alist/ https://cookbook.scheme.org/convert-any-value-to-string/

jcubic commented 2 years ago

Yes, I've noticed that. I could write a proper build script in NodeJS that supports GFM version of Markdown, but I don't think I have the skills to modify the scheme build scripts. Also, it uses publish markdown library and I'm not sure if there is a library for GFM that supports syntax highlighting and it will be too much work to create GFM Markdown library in Scheme.

@lassik if you want to can write the build script in NodeJS that will support scheme syntax highlighting. Unless you know library in Scheme that support GFM.

APIPLM commented 2 years ago

As I tried out two open source projects markdown/dingus and pandao/editor.md for the filecreating-subset-of-an-alist.md to preview the output html, they all have the same issue as we have out here. But the command of Preview&Export in Markdown mode in the project markdown-mode in Emacs work fine.

APIPLM commented 2 years ago

@jcubic yes. The issue is about GFM:GitHub Flavored Markdown. The reason for markdown-mode working in my site is that there is the command Pandocin my local, which is used by markdown-mode in Emacs. The egg Lowdown we used in this project is a port of an implementation of John Gruber's markdown in C, and it is same to markdown/dingus as I mentioned, it can not work properly. Maybe it only supports CommonMark. not GFM:GitHub Flavored Markdown. Currently we can do that transforming like we did like pandoc --from=gfm "surveys/$md" -o "www/$html" in the project schemedoc/surveys. For the egg Lowdown, it have not ready for our case because we have GFM:GitHub Flavored Markdown files in the receipt folder.

The pandao/editor.md sound like from NodeJS. The first time, the render failed in creating-subset-of-an-alist.md file as I mentioned in the above. After that, it works, but the color look like strange for the scheme code.

jcubic commented 2 years ago

Another idea is to use the current scheme code and use regex to extract ```scheme ``` markers from output HTML and use some other modification on it. There are a lot of libraries that highlight the code and for the scheme maybe there is already a library that highlights the scheme code. Unfortunately, I'm not that familiar with Scheme libraries for implementation used in the code.

APIPLM commented 2 years ago

Yes.That is a good idea as well.But I prefer to transform the whole thing. Donot worry.It is just in @lassik hand. It is easy for him to do.I kind of do not know JS too much. But as i looked at the html file in the preview page,which is the project pando/editor generate. It is not neat for the code block. That is why I have a guess that JS might not be a good solution. Actually I would like to see a good solution for code block rendering in NodeJs.

APIPLM commented 2 years ago

@jcubic I am not familiar with Scheme libraries either. :) It is all about try-out. I just did try in the docker image scheme-containers/chicken. Even installing eggs lowdown, ssax took a while for me to figure out, it is needed to install its dependence manually firstly in my site. For instance, install eggs clojurian and fancypants by running chicken-install clojurian and chicken-install fancypants. Otherwise there is an error message like time-out and stopped.

After finishing install all eggs, then run the script csi -s www.scm. index.html in the folders and index.html are generated. But the thing become more complex. None of them the index.html in the folder got rendered correctly for the code block. See the file index.html in the folder format-a-unix-timestamp in the below.

Using SRFI-19 ```scheme (define (time-unix->time-utc seconds) (add-duration (date->time-utc (make-date 0 0 0 0 1 1 1970 0)) (make-time time-duration 0 seconds)))

(define (time-unix->string seconds . maybe-format) (apply date->string (time-utc->date (time-unix->time-utc seconds)) maybe-format)) ```

Credit Göran Weinholt ## Usage

scheme
; Loko
> (time-unix->string 946684800)
"Sat Jan 01 00:00:00Z 2000"
; Chez
> (time-unix->string 946684800)
"Sat Jan 01 02:00:00+0200 2000"
; Guile
> (time-unix->string 946684800)
$1 = "Sat Jan 01 01:00:00+0100 2000"

But this one works well in the webpage . Sound like it is an ignore issue. It is about running environment? @lassik Can you have a time to look at?

APIPLM commented 2 years ago

I tried the tool markdown, which was porting to the egg lowdown . I transformed the file convert-any-value-to-string.md by running markdown convert-any-value-to-string.md >convert-any-value-to-string.html ,it has the same issue for the code block as I run the script www.scm in docker container scheme-containers/chicken.

lassik commented 2 years ago

I could write a proper build script in NodeJS that supports GFM version of Markdown, but I don't think I have the skills to modify the scheme build scripts. Also, it uses publish markdown library and I'm not sure if there is a library for GFM that supports syntax highlighting and it will be too much work to create GFM Markdown library in Scheme.

I talked to the maintainers of the lowdown Chicken egg by email, and we have a fix for the Markdown ``` issue. I'll fix it.

@lassik if you want to can write the build script in NodeJS that will support scheme syntax highlighting. Unless you know library in Scheme that support GFM.

wasamasa from the Chicken community has some code to do Scheme HTML syntax highlighting in Chicken; I'll try it out.

@jcubic I am not familiar with Scheme libraries either. :) It is all about try-out. I just did try in the docker image scheme-containers/chicken. Even installing eggs lowdown, ssax took a while for me to figure out, it is needed to install its dependence manually firstly in my site. For instance, install eggs clojurian and fancypants by running chicken-install clojurian and chicken-install fancypants. Otherwise there is an error message like time-out and stopped.

Yes, the top of www.scm says:

;; You need Chicken 5 and ;; chicken-install lowdown r7rs srfi-1 srfi-13 srfi-132 ssax

./www.sh runs the script. All of this could be easier.

https://akkuscm.org/ is the best package manager for portable Scheme code, it supports many Scheme implementations. Unfortunately, Chicken eggs are not well integrated with it yet. Scheme is still very much a do-it-yourself (DIY) language at this point in time.

lassik commented 2 years ago

GFM + coloring gist here: https://gist.github.com/wasamasa/e49e66e050255a8973270e0a52d68818 I'll integrate this properly into the lowdown egg and our www.scm.

APIPLM commented 2 years ago

I tried https://github.com/lepture/mistune one python library to transform one of md files, which can not be handled by the script www.scm. Lepture/mistune can transform that file for code block to the html file correctly.

Yes,you are right.wasamasa is the right way, we should go for our case. In his github repo, the project https://github.com/wasamasa/emacsninja.com, which genarate the static blog. And it is just fitted in our case. It has color code block But one thing I noticed that he organizes the rsts and transform them to the html file,NOT md files. And in the egg hyde, which is part of its project to build the static blog is mentioning. the egg lowndown or command Markdown as well. As we are using the latest egg lowndown or command Markdown, not sure whether the issue we have in the rendering code block during transforming the md file to the html got fixed. Or we need to adopt his solution and have rst files in the recipes folder.

There are a few try-out thing to do, like in hyde egg for transforming md files to html file and how it made the color code block in https://github.com/wasamasa/emacsninja.com. But my latop's hard drive got crashed, the system can not startup. I will catch up later.

APIPLM commented 2 years ago

lepture/mistune insprired by markedjs/marked which support the extended markup GFM. Whatever the lepture/mistune and markedjs/marked, or lowndownegg, they all come to the concept as JOHN GRUBER said in the markdown/dingus. It is that transforming the code block of md file to html with tags <pre> and <code> , but unfortunately different implementation has their different syntax, like in markedjs/marked three consecutive backtick characters (`). I call it syntax option one. See the detailed in the linker. In the markdown/dingus, the code block just start with 4 space or tab. Call it syntax option two.See the detailed in the linker . For our case, we have the option one syntax of md files in the reciept folder, but we have choose the egg lowndown, which is the port of jpm/peg-markdown,which the parser of the option two syntax. That is why we have workaround as we transform the md files in the reciept folder to the html.

The solution I have is that edit the md files to the option two syntax in the receipt folder like this remove the three characters ` in the code block and the language flag scheme. Each line of block code add four white or one tab( I suppose there is a way in Emacs, add tab for all selected source code).

jcubic commented 2 years ago

@APIPLM the problem with 4 spaces is that you lost information about the language. WIth GMF you can specify what language it should have. But maybe, in this case, it's not the problem because this is all scheme, but even that you need to specify what language it has. Sorry but just <pre><code> is not very user-friendly. I would have scheme syntax highlighting. It seems that with scheme libraries you're not able to use syntax highlighting. that's why I suggested writing this in NodeJS, that have lot more libraries that you can use.

APIPLM commented 2 years ago

@jcubic yes. For scheme syntax highlighting, the 4 spaces is not enough for rendering in html file. The reason what I suggest for 4 spaces or tab is that somehow I have not like GMF they proposed the concept yet, even through they have implemented in NodeJS and Python library. At least, I saw the three version of them. On other hand, for developers to select the source code to have 4 spaces or tab for the document is quite easy work. If an easy work can remove a complex concept. I rather do an easy work.

In term of scheme syntax highlighting. I think that markedjs/marked and lepture/mistune can do it.

lassik commented 2 years ago

The Chicken Scheme egg for syntax highlighting is colorize. The code is ported from Common Lisp.

I haven't been able to reach the maintainer of lowdown, but I'll put the GFM fix into our codebase until he adds it to the egg.

APIPLM commented 2 years ago

One more point, for example the source code from the test case in the colorize

(define (fact n)
  (if (= n 0)
      1
      (* n (fact (- n 1)))))

The output html in the below

<span class="paren1">(
  <span class="default">
    <i>
      <span class=
            "symbol">define
      </span>
    </i> 
    <span class="paren2">(
      <span class=
            "default">fact n
      </span>)
    </span> 
    <span class="paren2">(
      <span class=
            "default">
        <i>
          <span class="symbol">if
          </span>
        </i> 
        <span class=
              "paren3">(
          <span class="default">= n 0
          </span>)
        </span> 1 
        <span class=
              "paren3">(
          <span class="default">* n 
            <span class=
                  "paren4">(
              <span class="default">fact 
                <span class=
                      "paren5">(
                  <span class="default">- n
                    1
                  </span>)
                </span>
              </span>)
            </span>
          </span>)
        </span>
      </span>)
    </span>
  </span>)
</span>

But in the lowndown, the output html in the below

<p>
<pre>(define (fact n)
  (if (= n 0)
      1
      (* n (fact (- n 1)))))
</pre></p>

How can both of them corporate generate the output final html? Or we need an additional module?

jcubic commented 2 years ago

what about if output of first pass using lowdown is transformed using regex into the full output. You only need regex like

<pre>.*?</pre>

or something similar and extract the code and replace it with colorize. Unfortunately, I'm not that familiar with the chicken scheme and I'm able to code it. I don't even know how to install regex module.

lassik commented 2 years ago

Finally implemented in the above commit. Sorry about the delay. I uploaded the new pages to the server, e.g. https://cookbook.scheme.org/convert-any-value-to-string/

Parsing HTML using regexp is usually not a good idea; see e.g. https://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not

lassik commented 2 years ago

The code uses this line to convert the HTML from colorize back into SXML.

It comes from the html-parser Chicken egg.

jcubic commented 2 years ago

@lassik parsing random html pages with regex it not wise but I don't see a problem in parsing output of markdown library that pretty regular and for this case it only single tag that is pretty regular.

APIPLM commented 2 years ago

@lassik Such an excellent work.

lassik commented 2 years ago

Thanks, but wasamasa from Chicken did most of the hard work :)

APIPLM commented 2 years ago

That is fine, and this issue got fixed. One more thing about colorize of code block in the html. See the two below output htmls for the scheme source code.

(define-values (displayed written)
  (let ((repr (lambda (fn)
                (lambda (object)
                  (call-with-port (open-output-string)
                                  (lambda (port)
                                    (fn object port)
                                    (get-output-string port)))))))
    (values (repr display) (repr write))))

The first one is that the output htmlis transformed by python library pygments .

<pre style="line-height: 125%;"><span></span>(<span style="color: #0000FF">define-values</span> (<span style="color: #0000FF">displayed</span> <span style="color: #19177C">written</span>)
  (<span style="color: #008000; font-weight: bold">let </span>((<span style="color: #0000FF">repr</span> (<span style="color: #008000; font-weight: bold">lambda </span>(<span style="color: #0000FF">fn</span>)
                (<span style="color: #008000; font-weight: bold">lambda </span>(<span style="color: #0000FF">object</span>)
                  (<span style="color: #0000FF">call-with-port</span> (<span style="color: #0000FF">open-output-string</span>)
                                  (<span style="color: #008000; font-weight: bold">lambda </span>(<span style="color: #0000FF">port</span>)
                                    (<span style="color: #0000FF">fn</span> <span style="color: #19177C">object</span> <span style="color: #19177C">port</span>)
                                    (<span style="color: #0000FF">get-output-string</span> <span style="color: #19177C">port</span>)))))))
    (<span style="color: #008000">values </span>(<span style="color: #0000FF">repr</span> <span style="color: #19177C">display</span>) (<span style="color: #0000FF">repr</span> <span style="color: #19177C">write</span>))))
</pre>

It is parsing the parentheses like the below. You can preview the below the html by command View Buffer Contents in html mode in Emacs.


<pre style="line-height: 125%;">( ()
  ((( (()
                (()
                  ()
                                  (()
                                    ()
                                    ()))))))
    (() ())))
</pre>

The second one is that the output htmlis transformed by the egg colorize in chicken.


<span class="paren1">(<span class="default"><i><span class="symbol">define-values</span></i> <span class="paren2">(<span class="default">displayed written</span>)</span>
  <span class="paren2">(<span class="default"><i><span class="symbol">let</span></i> <span class="paren3">(<span class="default"><span class="paren4">(<span class="default">repr <span class="paren5">(<span class="default"><i><span class="symbol">lambda</span></i> <span class="paren6">(<span class="default">fn</span>)</span>
                <span class="paren6">(<span class="default"><i><span class="symbol">lambda</span></i> <span class="paren1">(<span class="default">object</span>)</span>
                  <span class="paren1">(<span class="default">call-with-port <span class="paren2">(<span class="default">open-output-string</span>)</span>
                                  <span class="paren2">(<span class="default"><i><span class="symbol">lambda</span></i> <span class="paren3">(<span class="default">port</span>)</span>
                                    <span class="paren3">(<span class="default">fn object port</span>)</span>
                                    <span class="paren3">(<span class="default">get-output-string port</span>)</span></span>)</span></span>)</span></span>)</span></span>)</span></span>)</span></span>)</span>
    <span class="paren3">(<span class="default">values <span class="paren4">(<span class="default">repr display</span>)</span> <span class="paren4">(<span class="default">repr write</span>)</span></span>)</span></span>)</span></span>)</span>

It is parsing the parentheses like the below. You can preview the below the html by command View Buffer Contents in html mode in Emacs.

<pre><span class="paren1">(<span class="default"> <span class="paren2">()</span>
  <span class="paren2">(<span class="default"> <span class="paren3">(<span class="default"><span class="paren4">(<span class="default"> <span class="paren5">(<span class="default"> <span class="paren6">(<span class="default"></span>)</span>
                <span class="paren6">(<span class="default"> <span class="paren1">()</span>
                  <span class="paren1">(<span class="default"> <span class="paren2">()</span>

                                  <span class="paren2">(<span class="default"> <span class="paren3">()</span>
                                    <span class="paren3">()</span>

                                    <span class="paren3">()</span></span>)</span></span>)</span></span>)</span></span>)</span></span>)</span></span>)</span>
    <span class="paren3">(<span class="default"> <span class="paren4">()</span> <span class="paren4">()</span></span>)</span></span>)</span></span>)</span></pre>

In term of parsing the parentheses, it is fine. But sound like the egg colorize stop at the content of the innermost parentheses.For instance, the source code get-output-string port in innermost parentheses in the above. In the egg colorize , the output htmllike this <span class="default">get-output-string port</span>. In the python library pygments the output htmllike this <span style="color: #0000FF">get-output-string</span> <span style="color: #19177C">port</span>. In this point, the python library pygments can render more content. If your guys think that it is an issue in colorize . we should raise an issue to the chicken community. pygments looks like quite actively. The egg colorize should compare with it.