jashkenas / coffeescript

Unfancy JavaScript
https://coffeescript.org/
MIT License
16.48k stars 1.99k forks source link

Literate CoffeeScript #1786

Closed revence27 closed 11 years ago

revence27 commented 12 years ago

This file, as it is, has been taken from tests/literate.literatecoffee in my fork. Without making any changes to it whatsoever, it is valid literate CoffeeScript, if you just copy and paste it into an editor, and make my fork of CoffeeScript run on it. Make the file end in .literatecoffee to make it go into literate mode. I wonder if Mr. Jeremy Ashkenas thinks it is a good idea. It would be good to know, before I send a pull request. (Oops! Earlier pull request for binary literals was pending, and it subsumed this one. Oh, well.) The change is only one line in only one file, and it is bound to be faster than fast in use. It is in my commit revence27/coffee-script@132d306e3391182dc2c410b9c46251d476e74c3f with an accompanying modification (different commit) of the Cakefile, that it may look for .literatecoffee files in the tests.

This, of course, is inspired by the Haskell programming language, but it is neater that the Haskell version of literate programming.

Beautiful Literate Programming

If this file parses at all, that is the test, and it has passed.

Originally, literate programming did not mean heavily-commented code. However, owing to the System, that is what it evolved to mean. Literate programming is supposed to be where code invades the commentary, not lots of comments invading the code.

  test "If this parses, literate coffee works.", ->
    eq 'So beautiful, I want to cry.', 'So beautiful, I want to cry.'

Under this scheme, everything is a comment. Except the bits that are indented. If a line does not start with whitespace, it is a comment. Everything else is code.

  test "I parse, therefore I work.", ->
    eq 4, 100 - 96

This is how we shall proceed with writing a string reverse in CoffeeScript.

  reverse = (str) ->
    rez = ''
    for chr in str
      rez = chr + rez
    rez

And then we will use it.

  test "Using a function defined within literate CoffeeScript.", ->
    eq 'So beautiful, I want to cry.', reverse '.yrc ot tnaw I ,lufituaeb oS'

See? Wasn't ugly, was it? Work on a syntax mode should not be difficult, in my opinion. (Although the only syntax things I know how to do are Vim, and even those, not too perfectly.) That will be all.

jashkenas commented 12 years ago

Yes -- I think this is very interesting. Please separate out the pull requests into two separate branches so we can look at them in isolation.

jashkenas commented 12 years ago

A couple things:

If this goes in to the core coffee command -- it would be great to not have to deal with a separate file extension. Ideally, it would be able to tell literate CoffeeScript files apart from ordinary ones.

Are you planning to tie this directly to Markdown? Or do you want to have the ability to use other markup languages?

How will multiple files be organized together? Are you planning to generate an index page, with some sort of navigation for browsing around?

Is the output format of the final document just HTML? Do you want to make it possible to generate a PDF of all your source code?

Would it be possible to add syntax highlighting for all the CoffeeScript snippets, in all output formats?

revence27 commented 12 years ago

Hello; I am trying to fight with git reset to de-couple these two commits. In the meantime, though, to answer your questions …

This thing has been inspired almost entirely by Haskell, which happens to be my mother tongue in programming. So most things are going to be just like in Literate Haskell, enabling what it does, and really little more than that.

Regarding the first one (magic combination): it is doable, and I think the idea is great because it thinks outside the Literate Haskell box in which I am. We would just have to agree on how literate programs should start. I guess using

Literate CoffeeScript

at the top of the file should help us detect magically.

I am not planning to tie this directly to markdown. They just accidentally happen to be perfect fits.

Regarding file concatenation, the files joined together had better have the same syntax. Mixing literate with illiterate and parsing it as one file would not work. If they are treated each file on its own, even if sent into coffee at the same time, it would work, since the magic words or the file name would be preserved.

The output of the final format is what your Docco produces. The text transformations do not usurp the literate programming that has already been used to great (beautiful!) effect in your code; they just provide another way to write literate CoffeeScript. They just make it so that it reads as beautifully in source code as it does in your generated docs.

I like to think that literate CoffeeScript is what you have not. But when it is capitalised, it is what is in the commit: Literate CoffeeScript. So Literate CoffeeScript transforms to literate CoffeeScript, which your tools can go to work on.

On syntax highlighting, since it doesn’t usurp your Docco system, it will do what Docco does.

quackingduck commented 12 years ago

Awesome work @revence27!

@jashkenas one nice thing about having a distinct extension (.lcoffee?) is that it's probably the way a lot of syntax highlighting programs decide how to lex a file. Something like pygments would need to use a different lexer for small L literate coffee files and capital L literate coffee files.

revence27 commented 12 years ago

@quackingduck We could have both; it is, after all, still one extra line of code. (Long line.) :-)

steveklabnik commented 12 years ago

This is pretty cool.

geraldalewis commented 12 years ago

I love the idea. Extensions don't strike me as the most elegant solution. A "use literate" compiler directive could work (like ES5's "use strict"). "Crossing the streams" by concatenating lit and non-lit code is a valid concern (as it is with intermixing strict and non-strict; global and non-global). However, I think solving the broader issue of script concat'ing is not within the scope of this issue.

michaelficarra commented 12 years ago

I like @geraldalewis's idea of using a "use literate" directive. It may make it more difficult to integrate with syntax highlighters, but shouldn't be impossible.

jashkenas commented 12 years ago

I was actually hoping for auto-detection ... purely automatic > file extension > magic comments > string directives, in my book.

michaelficarra commented 12 years ago

string directives > file extension > magic comments > automagic

quackingduck commented 12 years ago

@jashkenas agree that purely automatic is nice, just concerned that this code might not show up highlighted on github pages. Is there a precedent where github starts using a lexer based on a file extension and then switches to a different one based on auto-detection?

geraldalewis commented 12 years ago

hoping for auto-detection

Hmm... Generating an AST for a snippet, and seeing if it generates an error? Though I don't know how genuinely buggy code could be distinguished... Some kind of scoring system like @josh uses for language detection on GitHub?

revence27 commented 12 years ago

@jashkenas Purely-automatic cannot be done. Turing says so. (Actually, I think von Neuman is the one who says so.) By purely-automatic, what do you mean? Does my suggestion of having “Literate CoffeeScript” at the top count? After all, since it is meant to be read, that is a very good preamble.

revence27 commented 12 years ago

@geralalewis you say “Some kind of scoring system like @josh uses for language detection on GitHub?” No. This cannot be heuristic at all. It must be boolean and without any false negatives or false positives. And let us try for the simplest thing that is clean. Scoring systems are ugly.

michaelficarra commented 12 years ago

Totally agree with @revence27 here.

jashkenas commented 12 years ago

No worries -- let's use continue to use .literatecoffee for the time being -- there's a ton of more important stuff to get figured out. The index page, navigation, and HTML generation + highlighting are much more pressing.

I don't think it should use the Docco format, as you'll have a much higher ratio of prose : code.

autotelicum commented 12 years ago

What is the syntax of the proposed input format? Standard markdown?


I have been using sort-of-literate coffeescript in markdown with pandoc for some upcoming stuff. Pandoc can render to an epub or pdf ebook, to html, rtf, odt etc. Pandoc accepts both standard markdown code blocks indented with at least 4 spaces or code blocks delimited by bird tracks:

~~~~~~~~~~~~~~~~~~~~~~~{.coffeescript}
add = (a, b) -> a + b
show add 7, 8
# … 15
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I like the pandoc birdtracks because they make it possible to mix languages and to see that a document is literate without using a file extension; assuming that ~ is not valid as code. Pandoc has built-in support for coffeescript syntax highlighting for HTML. Extracting the code sections is simple in an editor with regex support (or possibly with sed/awk). I have been using it without any special support from the coffeescript compiler. In my editor (acme) it is `Edit ,x/^+[ ]{.coffeescript.}$/+,/^~~+$/-pto extract all code sections from a document, which can then be piped intocoffee -s` I would guess the regex parts are similar in other editors.

It would be nice (but not required for me) if this was built into the compiler. For example: if a file contain birdtracks then it is literate and everything outside of the birdtrack sections is discarded before lexing/parsing.


As an example here is a rendered version of Mark Hahn's A Beginner's Introduction to CoffeeKup. It isn't a literate document but shows what it can look like augmented by TeX template.

revence27 commented 12 years ago

@jashkenas Well, then, it will use a modified Docco format that takes into consideration the potential for a prose:code ratio that is steeped hard in the direction of prose. But these I believe can be solved in due time, without requiring that we do guesswork. The doc problem can be a thing of the future, since it is not hard to solve at all, even within the current Docc (perhaps with a few more lines).

I am leaning towards adding a test for “Literate CoffeeScript” at the top of the code to start the mode; I think it is excellent, considering that it is literate programming we are talking about. It is a fine preamble. Also, let us remember that it doesn’t have to be this or that. Within reason, we can have both and a boolean or in the code.

revence27 commented 12 years ago

@autotelicum It could be any syntax you are going to be using for your output. In the case of this issue, it was Markdown. But it could even be the syntax you speak of above. The best bet is to keep it simple and not over-specify, so that there is room for innovation later on.

revence27 commented 12 years ago

Literate CoffeeScript

I have added an alternative way to express that a script is Literate CoffeeScript. If the script starts with the line: “Literate CoffeeScript”, as does this comment, it is considered to be Literate CoffeeScript, regardless of the file extension. The commit is revence27/coffee-script@53078ff7d525f48d6f185e82c0ef56feeb8dfebb The meaning of this is that, as @jashkenas wants, one can write Literate CoffeeScript and, with the same file extension .coffee, still get it to be parsed as Literate CoffeeScript. The .literatecoffee extension should probably remain, because it helps other tools that do not check the content, and yet it does not seem to add very much of a cognitive burden. @jashkenas will decide here.

I did this because there is no chance that these two tokens will start a valid CoffeeScript program, yet they add a lot of beauty and rhetorical fluidity (as preamble) to the code. In fact, I think that even if that particular test is rejected from the code, the convention should be that Literate CoffeeScript start out as did this comment. (And we all know that code conventions are just missing compiler features.) It also helps tools like file(1) to make positive identifications of Literate CoffeeScript files.

So, there it is.

  console.log 'Merge me!'

I just had to put that there because I can, and you cannot stop me, and this comment will still compile. :-p

ninjacato commented 12 years ago

Hmm, 3 months since there was any activity on this issue. Is this still up for debate or has this idea died with the issue? Would love to see this in the next version of CoffeeScript. :-)

michaelficarra commented 12 years ago

It's not dead, there's just a lot of things to get done, and this isn't at the top of the list (or at least not mine).

jashkenas commented 11 years ago

Hey folks. I've pushed an initial implementation of literate CoffeeScript to the literate branch, here:

https://github.com/jashkenas/coffee-script/compare/master...literate

... I've initially tested it by formatting the src/scope.coffee source file as markdown, and it compiles beautifully.

Before merging it, there are a number of things that need to be done, the most important of which is ... It would be lovely if we can figure out a way to not have to add an additional file extension (.litcoffee, currently), in order to get proper compiles. In a perfect world, we would be able to compile both styles without an extension flag, or a special marker present somewhere in the file -- just by being able to detect either Markdown or CoffeeScript.

Any ideas?

vendethiel commented 11 years ago

Try first to parse it through coffee and fall back to litcoffee ? Add an annotation ? (#LITERATE?) I don't think there's really a "good" way, except trying to detect markdown in unindented lines

osuushi commented 11 years ago

What about a compromise solution, with a required marker that is also functional? For example, the literate.litcoffee example has a header at the top using the ------- markdown syntax. Requiring such a header (maybe it should be a ====== H1 header) seems like a reasonable convention for literate files, and it makes detection easy. More importantly, it would make accidental misinterpretation nearly impossible.

Given that a large number of lines of CoffeeScript are also valid lines of Markdown, it seems unlikely that unaided autodetection is going to work without significant and confusing edge cases. It would be a bad situation if, for example, a typo like a missing -> caused the compiler to suddenly see think a non-literate file was supposed to be literate. In the worst case, such an error might even cause the file to compile without a complaint, leaving the programmer to dig through and figure out what caused the misinterpretation.

jashkenas commented 11 years ago

... you'd think it wouldn't be to hard too detect CoffeeScript, and fall back to markdown ... but unfortunately code like this:

This is valid coffeescript code. Does it compile?

Compiles: http://coffeescript.org/#try:%20%20%20%20This%20is%20valid%20coffeescript%20code.%20Does%20it%20compile%3F

vendethiel commented 11 years ago

I don't think so http://is.gd/NuBxgb I agree with a marker tho

jashkenas commented 11 years ago

Whoa -- nice!

andrewrk commented 11 years ago

Can someone explain why this is a good idea? I don't understand how this is useful?

osuushi commented 11 years ago

@superjoe30 literate CoffeeScript in general, or reusing the same extension and autodetecting?

andrewrk commented 11 years ago

Literate CoffeeScript in general.

osuushi commented 11 years ago

@superjoe30 The Wikipedia article on literate programming is decent reading for understanding the basic purpose, but the main point is better code documentation.

bergie commented 11 years ago

Really glad to see this happening. If it is of any help, here is how I implemented literate programming for PHP: http://bergie.iki.fi/blog/literate_programming_with_php/

epidemian commented 11 years ago

Why would it be undesirable to add a new file extension for Literate CS files? Isn't that the standard mechanism that text editors and syntax highlighting tools (like github/linguist) use to detect the language? It's my understanding that Haskell also uses this strategy to difference from normal .hs and literate .lhs files.

If a separate extension is indeed undesirable, the proposal of adding a Literate CoffeeScript preamble for Literate CS seems fine to me (kinda like adding a hashbang to a script file :)

bergie commented 11 years ago

From the doctest.js discussion on HN:

My main concern with it is that it forces you to write the document in the same order you want the code to be extracted, which may not be the best order for explaining things. That is why classic Literate Programming tools like noweb allow you to name the chunks of code and then arrange them into the generated files as you wish. You can see an example of this in action when I'm assembling noweb.php.

One way to work with named chunks of code in Literate CoffeeScript would be to use the fenced code blocks syntax from Github-flavored Markdown. This way we could do stuff like:

```coffeescript;somechunk
# Contents of the chunk here

We would only need to figure out an appropriate chunk inclusion syntax. Noweb uses `<<chunkname>>`, so you could call in that previous named chunk into an arbitrary location of your document with:
# Some code, then the chunk:
<<somechunk>>
# Code continues
jashkenas commented 11 years ago

... and replied on HN: http://news.ycombinator.com/item?id=4608260

bergie commented 11 years ago

@jashkenas well argumented. I guess the way proposed in this issue makes Literate Programming a lot more approachable.

The next question with this approach is editor support, as then you'd have to understand both the Markdown syntax and CoffeeScript, and the relations between the two. Emacs probably makes this easy with major and minor modes, but with other editors this can be trickier.

jashkenas commented 11 years ago

Yes -- I think that's actually not so hard. I'm going to attempt it for textmate / sublime. Those editors are already able to combine parse modes ... for example, JavaScript is correctly highlighted within an HTML document. Here's it's easier -- any block that's indented more than four whitespace characters should be passed to the CoffeeScript highlighter.

bergie commented 11 years ago

For a different take on editors, I've been toying with the idea of combining collection support in Create.js, Hallo's Markdown mode and something like CodeMirror to make a web-based WYSIWYG editor for literate programming.

On longer term you could then utilize features like image insertion, or even connect it with a web-based graph editor so you could produce really nice documents while you code.

The less separation there is between your documentation and the code, the more likely they'll both remain up-to-date.

niclashoyer commented 11 years ago

is there any support in gedit / gtksourceview for this? That would be great!

niclashoyer commented 11 years ago

I quickly wrote my own Literate Coffeescript syntax definition for gedit, see my gist. It is far from perfect, but at least Markdown / CoffeeScript highlighting works.

You need gedit-markdown (included by default) and gedit-coffeescript.

Place the file in ~/.local/share/gtksourceview-3.0/language-specs

supersym commented 11 years ago

Cool. I actually had the same idea earlier and wanted to actually start work on it but now I don't have to. Thanks!

semperos commented 11 years ago

This looks like a great addition. However, after trying a few syntaxes, it appears that there isn't support for "fenced code blocks." Is this the case?

In an already white-space sensitive language like CoffeeScript, adding an extra layer of indentation for Markdown code blocks is problematic. Fenced code blocks, for me, would be a must before using this.

jashkenas commented 11 years ago

You're correct. Fenced code blocks aren't a syntax of ordinary Markdown that I'm aware of.

http://daringfireball.net/projects/markdown/syntax#precode

Aren't they a special GitHub thing?

semperos commented 11 years ago

They are not part of standard Markdown, that's correct, but they are a common addition by many Markdown libraries, including Showdown.js, Github-flavored markdown via Sundown, Ruby's Kramdown, etc.

For the purposes of literate CoffeeScript, adding more indentation makes things tougher to read and edit, especially if one is using something like RequireJS wherein most of the file is already indented to live inside the callback to define.

satyr commented 11 years ago

Right now the literate syntax is bare minimum--far from full-blown Markdown.

-   Lists
    like
    this
-   fail, for example.
supersym commented 11 years ago

First of all, to see a little experimentation of mine with Literate CoffeeScript and the result rendered by GitHub using GitHub Flavored Markdown (GFM), go here. The raw source of this file can be found here, it is a .litcoffee extension file that I did a simple cp a.litcoffee b.md on. That last link, more particular the text inside the file can be copied and pasted from the operating system clipboard in some editor like this live preview editor here. You can easily see how it renders to HTML. There are browser extensions who do these things too, as there are some downloadable viewers/readers/editors (mostly Mac) that have some GUI interaction.

I too have been toying with some ideas I had on these, and related fields, around literate programming and UI design, as a response to bergie. I sent you an email actually to see if we can do something with it. Note that there are, although few, several tools that can take very different approaches. Someone interested in LP should also checkout [Codnar](https://github.com/orenbenkiki/codnar](https://github.com/orenbenkiki/codnar). There are

supersym commented 11 years ago

I'm not sure if this was considered, but it is possible to append the fenced code block triple backticks ```coffeescript (filename) where you can easily sequentally parse/concat those into seperate files.

I could imagine that explaining why you have the file/directory structure could partially be together in a 'story' and then, we moving back to the main idea, keep writing chunks of files as a coherent red-line throughout, and branch off as the flow of thoughts come along.

Donald Knuth and other usually do also have a outline that goes from 'require' to the 'execution' etc. which is still a bit too much towards computers and not humans, but does make a lot more sense when they get explained.

``coffeescript 
# file1
    # (1A) code goes here
``

Story line continues lorem ipsum...

``coffeescript 
# path/to/file2

     # code goes here

``

Story line continues

``coffeescript
# file1

    # (1B) execute some code goes concat under (A)

``

And indeed, with functions and requirements being scanned before execution, we have all we need to write a entire book in 1 file. Just don't forget the module.exports.

Updated: example that does get parsed properly, having the file name on the first carriage return after coffeescript. It could even be easily done with additional tab for aesthetics

bergie commented 11 years ago

@supersym I suggested something similar in this Hacker News thread: http://news.ycombinator.com/item?id=4608165

Here is the response from @jashkenas back then:

this is probably the main complaint raised when talking about tools like Docco or Markdown-as-source-code as "literate programming". But, it's a concern that I believe is entirely outdated. Modern dynamic languages make it easy to sequence your code as you like. The methods in a class may be listed in any logical order, helper functions can be listed in an appendix after the functions that make use of them, and so on. I find it hard to imagine an example where changing the order of the codebase would make the prose version more readable -- and wouldn't also make the code version more readable as well.

supersym commented 11 years ago

Right. And he's probably right :) But we're still talking 1 file? I didn't see that pass in the discussion of 1 litcoffee or 10 or 20 .litcoffee files. One file makes sense from a literary standpoint, and for easy handling / maintenance (debatable, enough author make a seperate file for each chapter because 200 pages aint fun) but e.g. what they did with literate clojure and such is packing it all in 1 file. And after that, the classes get extracted etc. I should really look deeper what function he used for that and apply it to coffeescript so I can extract files if I want to, or not.

Using the above suggestion I made, and ^(`{3}coffeescript( \r|\r)()(.*)) regex expression I have enough to do this . Updated because file on the same line doesnt get parsed

http://www.gridlinked.info/oop-with-coffeescript-javascript/ somekind of namespaces is something I should investigate a bit better too