racket / rhombus

Rhombus programming language
Other
332 stars 59 forks source link

Language workbench for syntax experiments #150

Open stereobooster opened 3 years ago

stereobooster commented 3 years ago

Imagine:

  1. we have some kind of language workbench (like spoofax). it means that we can write BNF-like syntax description (in spoofax they use SDF) and we will get parser, syntax highlighter and pretty printer for free.

  2. but what if instead of AST it would return S-Expressions, then we can write some number of nanopass-like transformations which would turn those S-Expressions into actual Racket code. So now we can write BNF-definition of syntax and small amount transformation and we will get "reader" for our language which would automatically transpiled to Racket.

  3. If we can automatically reverse those transformations (think of relational programming, like what you can do with eval implemented in MiniKanren) it means that we can take any existing Racket program and transform it (pretty-print) to a new language.

Watch how fast you can prototype new syntax in Spoofax.

Plus we can reuse transformations and ideas from different proposals to see if new syntax makes sense or not.

Example: assume we have a language with a function call like this (C-like syntax)

f(a, b)

it can be parsed as:

(f (a b))

then we apply transformation and get:

(f a b)

Which is a valid Racket code. And we can transform it back. And pretty print after.

PS. I wrote an article about this, later I discovered Spoofax and original idea changed a bit...

Related: #3

jackfirth commented 3 years ago

Have you read Beautiful Racket? The #lang brag language it uses gets pretty close to this idea, or it's at least related.

stereobooster commented 3 years ago

@jackfirth I'm aware of it, but haven't played with it.

If the grammar is ambiguous, brag will choose one of the possible parse results, though it doesn’t guarantee which.

I understand it as that you would need to change grammar to make it work as you expect, where is SDF has special operators to deal with ambiguity (associativity, precedence and other). So you don't change the grammar, but declare how to resolve ambiguities. When you parse it will show all parse trees (so called parse forest), and ambiguous branches would be marked with amb node.

For example, traditional approach. Grammar:

Exp → Exp Op Exp | ( Exp ) | Num
Op  → + | - | * | /

Expression 2 * 5 - 10 can be parsed like this, which is not what you expect

To make it work "right" you need to change it to

AddOp  → + | -
AddExp → AddExp AddOp MulExp | MulExp
MulOp  → * | /
MulExp → MulExp MulOp NumOrParen | NumOrParen
NumOrParen → (AddExp) | Num

In SDF to fix the situation you can do something like this (it declares left associativity):

AddExp → AddExp AddOp AddExp {left}

But I must admit that I like idea of cut and splice rules in brag (and again I haven't played with it much, I need to experiment more to draw conclusions).

Also it is possible to generate pretty-printer from SDF grammar.

slaymaker1907 commented 3 years ago

I think it would be a good idea to provide a sublanguage like #lang brag in the standard library. Ideally it should also work nicely for mini-langs which have complex internal parsing that they need to do. For example, adding in a macro which lets you do infix math to an s-expr based language.

While something like #lang brag is probably close to ideal for writing the parser, I'm not convinced that the Beautiful Racket approach for lexing. There probably should be a conventional way to do this in Rhombus that is simple and flexible, anyone have any ideas for what that could be?

stereobooster commented 3 years ago

Not sure what exactly the issue with lexing in Beautiful Racket. But there are algorithms that don't have separate lexing step ("scanerless"), for example PEG and SGRL.

UPD: here is a big list of parsing algorithms https://stereobooster.com/posts/an-overview-of-parsing-algorithms/

slaymaker1907 commented 3 years ago

@stereobooster thanks for the link, it might be a good idea to make the default combine both lexing and parsing using a unified #lang to make custom langs easy to write. https://en.wikipedia.org/wiki/Scannerless_parsing also seems to have a good overview of scannerless parsers.