tolmasky / language

A fast PEG parser written in JavaScript with first class errors
languagejs.com
MIT License
411 stars 48 forks source link

Support for indentation based syntax #8

Open seflless opened 13 years ago

seflless commented 13 years ago

Hey Francisco. We'd met and had a brief conversation about elegantly supporting whitespace based syntax ala python/coffeescript.

Are you still thinking of doing that, or have you come to the conclusion that it's either not your preference or architecturally sound?

tolmasky commented 13 years ago

I would certainly still like to do it but I don't think its possible in the way we spoke that night. If I recall correctly, essentially what we wanted was something like:

SignifcantWhitespaceReplacement = "{" SignifcantWhitespaceReplacementClosing = "}"

Such that in an initial pass, something like

if blah
    do_something

would be changed to:

if blah
{
    do_something
}

And then you could write your grammar using { and }. Unfortunately, there's no easy way to do this initial pass, because we have no context as to the rest of the language. For example, we may very well enter into strings and misreplace whitespace. This might be feasible in something like LALR where you have a set of tokens that you may be able to safely navigate, but even then you are going to be doing significant language-aware calculations because the whitespace may be a harmless delimiter (like the whitespace between the if keyword and the condition which you wouldn't want to replace).

As such, I am still very much interested in doing this, and doing it in a purely declarative way on top of that, but I simply haven't found a good way to do it yet. The best I've seen requires code predicates which I really dislike because they need to hold on to global state (the current "indentation level"). Any ideas?

seflless commented 13 years ago

I'll post some thoughts tomorrow, my answer was starting to get pretty long. I'll go over it tomorrow, look at how you are actually doing it already to synchronize with your mental model. I saved my work in progress into a text file and will look into it tomorrow after porting over a part of my language's syntax to language.js.

seflless commented 13 years ago

I'm having a hard time figuring out exactly how to use the project. How do you built it, use Language Visualizer, and run the command line etc. When you have some time could you write up some documentation. I started reading through code, but it was taking longer to get my head around than it would with something to play with.

No rush, just when you get a chance. I'd like to take a crack at experimenting with some ideas. I'm starting to think it just has to be built into the runtime as a special set of characters that get fed into the productions matching code. Or just built in behaviour. But until I get my hands dirty, I'll never know.

tolmasky commented 13 years ago

I'm right about to head out, but I can give you these quick steps and then expand on them later if its not enough:

  1. I've made it such that you can just do: $ cd path/to/language $ npm link $ language -g yourgrammer.language > parser.js
  2. To use the language visualizer, you want to build a browser version, so: $ language -g yourgrammer.language --browser=Parser > parser.js

Then copy parser.js into LanguageVisualizer/ and run LanguageVisualizer/index.html in your browser

That should be it, hope that helps!

Thanks,

Francisco

On Jul 9, 2011, at 7:04 PM, francoislaberge wrote:

I'm having a hard time figuring out exactly how to use the project. How do you built it, use Language Visualizer, and run the command line etc. When you have some time could you write up some documentation. I started reading through code, but it was taking longer to get my head around than it would with something to play with.

No rush, just when you get a chance. I'd like to take a crack at experimenting with some ideas. I'm starting to think it just has to be built into the runtime as a special set of characters that get fed into the productions matching code. Or just built in behaviour. But until I get my hands dirty, I'll never know.

Reply to this email directly or view it on GitHub: https://github.com/tolmasky/language/issues/8#issuecomment-1540509

tolmasky commented 13 years ago

BTW, did you mean to close this?

seflless commented 13 years ago

That's weird, I left a Comment and hit Comment & Close. I'm not seeing my Comment. Was saying that: I'm working on generating a parser that does track indentation, and then generates a special character that is fed into the production matching logic, it's getting messy quickly.

Looking at Python as a test case. There are a bunch of edge cases where whitespace matters or not depending on the surrounding code. For example IF statements.

Valid:

if 1==1:
    print "This be truth."

Invalid:

if x==10
:
    print "Ten it is"

Valid:

if ( 1==1
):
    print "This be harsh truth."

And then there is detecting double indentation, which makes bracket insertion as a quick hack definitely not work.

There are a lot more cases. But it was a good exploration of supporting, I'll keep dabbling in it. I'm back on work work today.