Closed revence27 closed 11 years ago
Yes -- I think this is very interesting. Please separate out the pull requests into two separate branches so we can look at them in isolation.
A couple things:
If this goes in to the core coffee
command -- it would be great to not have to deal with a separate file extension. Ideally, it would be able to tell literate CoffeeScript files apart from ordinary ones.
Are you planning to tie this directly to Markdown? Or do you want to have the ability to use other markup languages?
How will multiple files be organized together? Are you planning to generate an index page, with some sort of navigation for browsing around?
Is the output format of the final document just HTML? Do you want to make it possible to generate a PDF of all your source code?
Would it be possible to add syntax highlighting for all the CoffeeScript snippets, in all output formats?
Hello;
I am trying to fight with git reset
to de-couple these two commits. In the meantime, though, to answer your questions …
This thing has been inspired almost entirely by Haskell, which happens to be my mother tongue in programming. So most things are going to be just like in Literate Haskell, enabling what it does, and really little more than that.
Regarding the first one (magic combination): it is doable, and I think the idea is great because it thinks outside the Literate Haskell box in which I am. We would just have to agree on how literate programs should start. I guess using
Literate CoffeeScript
at the top of the file should help us detect magically.
I am not planning to tie this directly to markdown. They just accidentally happen to be perfect fits.
Regarding file concatenation, the files joined together had better have the same syntax. Mixing literate with illiterate and parsing it as one file would not work. If they are treated each file on its own, even if sent into coffee
at the same time, it would work, since the magic words or the file name would be preserved.
The output of the final format is what your Docco produces. The text transformations do not usurp the literate programming that has already been used to great (beautiful!) effect in your code; they just provide another way to write literate CoffeeScript. They just make it so that it reads as beautifully in source code as it does in your generated docs.
I like to think that literate CoffeeScript is what you have not. But when it is capitalised, it is what is in the commit: Literate CoffeeScript. So Literate CoffeeScript transforms to literate CoffeeScript, which your tools can go to work on.
On syntax highlighting, since it doesn’t usurp your Docco system, it will do what Docco does.
Awesome work @revence27!
@jashkenas one nice thing about having a distinct extension (.lcoffee
?) is that it's probably the way a lot of syntax highlighting programs decide how to lex a file. Something like pygments would need to use a different lexer for small L literate coffee files and capital L literate coffee files.
@quackingduck We could have both; it is, after all, still one extra line of code. (Long line.) :-)
This is pretty cool.
I love the idea.
Extensions don't strike me as the most elegant solution.
A "use literate"
compiler directive could work (like ES5's "use strict"
).
"Crossing the streams" by concatenating lit
and non-lit
code is a valid concern (as it is with intermixing strict
and non-strict
; global and non-global). However, I think solving the broader issue of script concat'ing is not within the scope of this issue.
I like @geraldalewis's idea of using a "use literate
" directive. It may make it more difficult to integrate with syntax highlighters, but shouldn't be impossible.
I was actually hoping for auto-detection ... purely automatic > file extension > magic comments > string directives, in my book.
string directives > file extension > magic comments > automagic
@jashkenas agree that purely automatic is nice, just concerned that this code might not show up highlighted on github pages. Is there a precedent where github starts using a lexer based on a file extension and then switches to a different one based on auto-detection?
hoping for auto-detection
Hmm... Generating an AST for a snippet, and seeing if it generates an error? Though I don't know how genuinely buggy code could be distinguished... Some kind of scoring system like @josh uses for language detection on GitHub?
@jashkenas Purely-automatic cannot be done. Turing says so. (Actually, I think von Neuman is the one who says so.) By purely-automatic, what do you mean? Does my suggestion of having “Literate CoffeeScript” at the top count? After all, since it is meant to be read, that is a very good preamble.
@geralalewis you say “Some kind of scoring system like @josh uses for language detection on GitHub?” No. This cannot be heuristic at all. It must be boolean and without any false negatives or false positives. And let us try for the simplest thing that is clean. Scoring systems are ugly.
Totally agree with @revence27 here.
No worries -- let's use continue to use .literatecoffee
for the time being -- there's a ton of more important stuff to get figured out. The index page, navigation, and HTML generation + highlighting are much more pressing.
I don't think it should use the Docco format, as you'll have a much higher ratio of prose : code.
What is the syntax of the proposed input format? Standard markdown?
I have been using sort-of-literate coffeescript in markdown with pandoc for some upcoming stuff. Pandoc can render to an epub or pdf ebook, to html, rtf, odt etc. Pandoc accepts both standard markdown code blocks indented with at least 4 spaces or code blocks delimited by bird tracks:
~~~~~~~~~~~~~~~~~~~~~~~{.coffeescript}
add = (a, b) -> a + b
show add 7, 8
# … 15
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I like the pandoc birdtracks because they make it possible to mix languages and to see that a document is literate without using a file extension; assuming that ~ is not valid as code. Pandoc has built-in support for coffeescript syntax highlighting for HTML. Extracting the code sections is simple in an editor with regex support (or possibly with sed/awk). I have been using it without any special support from the coffeescript compiler. In my editor (acme) it is `Edit ,x/^+[ ]{.coffeescript.}$/+,/^~~+$/-pto extract all code sections from a document, which can then be piped into
coffee -s` I would guess the regex parts are similar in other editors.
It would be nice (but not required for me) if this was built into the compiler. For example: if a file contain birdtracks then it is literate and everything outside of the birdtrack sections is discarded before lexing/parsing.
As an example here is a rendered version of Mark Hahn's A Beginner's Introduction to CoffeeKup. It isn't a literate document but shows what it can look like augmented by TeX template.
@jashkenas Well, then, it will use a modified Docco format that takes into consideration the potential for a prose:code ratio that is steeped hard in the direction of prose. But these I believe can be solved in due time, without requiring that we do guesswork. The doc problem can be a thing of the future, since it is not hard to solve at all, even within the current Docc (perhaps with a few more lines).
I am leaning towards adding a test for “Literate CoffeeScript” at the top of the code to start the mode; I think it is excellent, considering that it is literate programming we are talking about. It is a fine preamble. Also, let us remember that it doesn’t have to be this or that. Within reason, we can have both and a boolean or in the code.
@autotelicum It could be any syntax you are going to be using for your output. In the case of this issue, it was Markdown. But it could even be the syntax you speak of above. The best bet is to keep it simple and not over-specify, so that there is room for innovation later on.
Literate CoffeeScript
I have added an alternative way to express that a script is Literate CoffeeScript. If the script starts with the line: “Literate CoffeeScript”, as does this comment, it is considered to be Literate CoffeeScript, regardless of the file extension.
The commit is revence27/coffee-script@53078ff7d525f48d6f185e82c0ef56feeb8dfebb
The meaning of this is that, as @jashkenas wants, one can write Literate CoffeeScript and, with the same file extension .coffee
, still get it to be parsed as Literate CoffeeScript. The .literatecoffee
extension should probably remain, because it helps other tools that do not check the content, and yet it does not seem to add very much of a cognitive burden.
@jashkenas will decide here.
I did this because there is no chance that these two tokens will start a valid CoffeeScript program, yet they add a lot of beauty and rhetorical fluidity (as preamble) to the code. In fact, I think that even if that particular test is rejected from the code, the convention should be that Literate CoffeeScript start out as did this comment. (And we all know that code conventions are just missing compiler features.) It also helps tools like file(1) to make positive identifications of Literate CoffeeScript files.
So, there it is.
console.log 'Merge me!'
I just had to put that there because I can, and you cannot stop me, and this comment will still compile. :-p
Hmm, 3 months since there was any activity on this issue. Is this still up for debate or has this idea died with the issue? Would love to see this in the next version of CoffeeScript. :-)
It's not dead, there's just a lot of things to get done, and this isn't at the top of the list (or at least not mine).
Hey folks. I've pushed an initial implementation of literate CoffeeScript to the literate branch, here:
https://github.com/jashkenas/coffee-script/compare/master...literate
... I've initially tested it by formatting the src/scope.coffee
source file as markdown, and it compiles beautifully.
Before merging it, there are a number of things that need to be done, the most important of which is ... It would be lovely if we can figure out a way to not have to add an additional file extension (.litcoffee
, currently), in order to get proper compiles. In a perfect world, we would be able to compile both styles without an extension flag, or a special marker present somewhere in the file -- just by being able to detect either Markdown or CoffeeScript.
Any ideas?
Try first to parse it through coffee and fall back to litcoffee ? Add an annotation ? (#LITERATE?) I don't think there's really a "good" way, except trying to detect markdown in unindented lines
What about a compromise solution, with a required marker that is also functional? For example, the literate.litcoffee example has a header at the top using the -------
markdown syntax. Requiring such a header (maybe it should be a ======
H1 header) seems like a reasonable convention for literate files, and it makes detection easy. More importantly, it would make accidental misinterpretation nearly impossible.
Given that a large number of lines of CoffeeScript are also valid lines of Markdown, it seems unlikely that unaided autodetection is going to work without significant and confusing edge cases. It would be a bad situation if, for example, a typo like a missing ->
caused the compiler to suddenly see think a non-literate file was supposed to be literate. In the worst case, such an error might even cause the file to compile without a complaint, leaving the programmer to dig through and figure out what caused the misinterpretation.
... you'd think it wouldn't be to hard too detect CoffeeScript, and fall back to markdown ... but unfortunately code like this:
This is valid coffeescript code. Does it compile?
I don't think so http://is.gd/NuBxgb I agree with a marker tho
Whoa -- nice!
Can someone explain why this is a good idea? I don't understand how this is useful?
@superjoe30 literate CoffeeScript in general, or reusing the same extension and autodetecting?
Literate CoffeeScript in general.
@superjoe30 The Wikipedia article on literate programming is decent reading for understanding the basic purpose, but the main point is better code documentation.
Really glad to see this happening. If it is of any help, here is how I implemented literate programming for PHP: http://bergie.iki.fi/blog/literate_programming_with_php/
Why would it be undesirable to add a new file extension for Literate CS files? Isn't that the standard mechanism that text editors and syntax highlighting tools (like github/linguist) use to detect the language? It's my understanding that Haskell also uses this strategy to difference from normal .hs
and literate .lhs
files.
If a separate extension is indeed undesirable, the proposal of adding a Literate CoffeeScript
preamble for Literate CS seems fine to me (kinda like adding a hashbang to a script file :)
From the doctest.js discussion on HN:
My main concern with it is that it forces you to write the document in the same order you want the code to be extracted, which may not be the best order for explaining things. That is why classic Literate Programming tools like noweb allow you to name the chunks of code and then arrange them into the generated files as you wish. You can see an example of this in action when I'm assembling noweb.php.
One way to work with named chunks of code in Literate CoffeeScript would be to use the fenced code blocks syntax from Github-flavored Markdown. This way we could do stuff like:
```coffeescript;somechunk
# Contents of the chunk here
We would only need to figure out an appropriate chunk inclusion syntax. Noweb uses `<<chunkname>>`, so you could call in that previous named chunk into an arbitrary location of your document with:
# Some code, then the chunk:
<<somechunk>>
# Code continues
... and replied on HN: http://news.ycombinator.com/item?id=4608260
@jashkenas well argumented. I guess the way proposed in this issue makes Literate Programming a lot more approachable.
The next question with this approach is editor support, as then you'd have to understand both the Markdown syntax and CoffeeScript, and the relations between the two. Emacs probably makes this easy with major and minor modes, but with other editors this can be trickier.
Yes -- I think that's actually not so hard. I'm going to attempt it for textmate / sublime. Those editors are already able to combine parse modes ... for example, JavaScript is correctly highlighted within an HTML document. Here's it's easier -- any block that's indented more than four whitespace characters should be passed to the CoffeeScript highlighter.
For a different take on editors, I've been toying with the idea of combining collection support in Create.js, Hallo's Markdown mode and something like CodeMirror to make a web-based WYSIWYG editor for literate programming.
On longer term you could then utilize features like image insertion, or even connect it with a web-based graph editor so you could produce really nice documents while you code.
The less separation there is between your documentation and the code, the more likely they'll both remain up-to-date.
is there any support in gedit / gtksourceview for this? That would be great!
I quickly wrote my own Literate Coffeescript syntax definition for gedit, see my gist. It is far from perfect, but at least Markdown / CoffeeScript highlighting works.
You need gedit-markdown (included by default) and gedit-coffeescript.
Place the file in ~/.local/share/gtksourceview-3.0/language-specs
Cool. I actually had the same idea earlier and wanted to actually start work on it but now I don't have to. Thanks!
This looks like a great addition. However, after trying a few syntaxes, it appears that there isn't support for "fenced code blocks." Is this the case?
In an already white-space sensitive language like CoffeeScript, adding an extra layer of indentation for Markdown code blocks is problematic. Fenced code blocks, for me, would be a must before using this.
You're correct. Fenced code blocks aren't a syntax of ordinary Markdown that I'm aware of.
http://daringfireball.net/projects/markdown/syntax#precode
Aren't they a special GitHub thing?
They are not part of standard Markdown, that's correct, but they are a common addition by many Markdown libraries, including Showdown.js, Github-flavored markdown via Sundown, Ruby's Kramdown, etc.
For the purposes of literate CoffeeScript, adding more indentation makes things tougher to read and edit, especially if one is using something like RequireJS wherein most of the file is already indented to live inside the callback to define
.
Right now the literate syntax is bare minimum--far from full-blown Markdown.
- Lists
like
this
- fail, for example.
First of all, to see a little experimentation of mine with Literate CoffeeScript and the
result rendered by GitHub using GitHub Flavored Markdown (GFM), go here. The raw source of this file can be found here, it is a .litcoffee extension file that I did a simple cp a.litcoffee b.md
on. That last link, more particular the text inside the file can be copied and pasted from the operating system clipboard in some editor like this live preview editor here. You can easily see how it renders to HTML. There are browser extensions who do these things too, as there are some downloadable viewers/readers/editors (mostly Mac) that have some GUI interaction.
I too have been toying with some ideas I had on these, and related fields, around literate programming and UI design, as a response to bergie. I sent you an email actually to see if we can do something with it. Note that there are, although few, several tools that can take very different approaches. Someone interested in LP should also checkout [Codnar](https://github.com/orenbenkiki/codnar](https://github.com/orenbenkiki/codnar). There are
I'm not sure if this was considered, but it is possible to append the fenced code block triple backticks ```coffeescript (filename)
where you can easily sequentally parse/concat those into seperate files.
I could imagine that explaining why you have the file/directory structure could partially be together in a 'story' and then, we moving back to the main idea, keep writing chunks of files as a coherent red-line throughout, and branch off as the flow of thoughts come along.
Donald Knuth and other usually do also have a outline that goes from 'require' to the 'execution' etc. which is still a bit too much towards computers and not humans, but does make a lot more sense when they get explained.
``coffeescript
# file1
# (1A) code goes here
``
Story line continues lorem ipsum...
``coffeescript
# path/to/file2
# code goes here
``
Story line continues
``coffeescript
# file1
# (1B) execute some code goes concat under (A)
``
And indeed, with functions and requirements being scanned before execution, we have all we need to write a entire book in 1 file. Just don't forget the module.exports
.
Updated: example that does get parsed properly, having the file name on the first carriage return after coffeescript. It could even be easily done with additional tab for aesthetics
@supersym I suggested something similar in this Hacker News thread: http://news.ycombinator.com/item?id=4608165
Here is the response from @jashkenas back then:
this is probably the main complaint raised when talking about tools like Docco or Markdown-as-source-code as "literate programming". But, it's a concern that I believe is entirely outdated. Modern dynamic languages make it easy to sequence your code as you like. The methods in a class may be listed in any logical order, helper functions can be listed in an appendix after the functions that make use of them, and so on. I find it hard to imagine an example where changing the order of the codebase would make the prose version more readable -- and wouldn't also make the code version more readable as well.
Right. And he's probably right :) But we're still talking 1 file? I didn't see that pass in the discussion of 1 litcoffee or 10 or 20 .litcoffee files. One file makes sense from a literary standpoint, and for easy handling / maintenance (debatable, enough author make a seperate file for each chapter because 200 pages aint fun) but e.g. what they did with literate clojure and such is packing it all in 1 file. And after that, the classes get extracted etc. I should really look deeper what function he used for that and apply it to coffeescript so I can extract files if I want to, or not.
Using the above suggestion I made, and ^(`{3}coffeescript( \r|\r)()(.*)) regex expression I have enough to do this . Updated because file on the same line doesnt get parsed
http://www.gridlinked.info/oop-with-coffeescript-javascript/ somekind of namespaces is something I should investigate a bit better too
This file, as it is, has been taken from
tests/literate.literatecoffee
in my fork. Without making any changes to it whatsoever, it is valid literate CoffeeScript, if you just copy and paste it into an editor, and make my fork of CoffeeScript run on it. Make the file end in.literatecoffee
to make it go into literate mode. I wonder if Mr. Jeremy Ashkenas thinks it is a good idea. It would be good to know, before I send a pull request. (Oops! Earlier pull request for binary literals was pending, and it subsumed this one. Oh, well.) The change is only one line in only one file, and it is bound to be faster than fast in use. It is in my commit revence27/coffee-script@132d306e3391182dc2c410b9c46251d476e74c3f with an accompanying modification (different commit) of theCakefile
, that it may look for.literatecoffee
files in the tests.This, of course, is inspired by the Haskell programming language, but it is neater that the Haskell version of literate programming.
Beautiful Literate Programming
If this file parses at all, that is the test, and it has passed.
Originally, literate programming did not mean heavily-commented code. However, owing to the System, that is what it evolved to mean. Literate programming is supposed to be where code invades the commentary, not lots of comments invading the code.
Under this scheme, everything is a comment. Except the bits that are indented. If a line does not start with whitespace, it is a comment. Everything else is code.
This is how we shall proceed with writing a string reverse in CoffeeScript.
And then we will use it.
See? Wasn't ugly, was it? Work on a syntax mode should not be difficult, in my opinion. (Although the only syntax things I know how to do are Vim, and even those, not too perfectly.) That will be all.