Simplify the grammar - Githubissues

uk-ar commented 9 years ago

I'm sorry for this large patch. But it is hard to fix some defects with current grammar. So, I modified the grammar and related lexer. And this PR fixes #85 #59 #72 #35. How about this?

ap4y commented 9 years ago

Hey @uk-ar I will need some time to check this changes a bit more carefully and evaluate this "grammar-less" approach. Unfortunately, I won't have much time in next several days, but I will get back to this PR asap.

uk-ar commented 9 years ago

Hi @ap4y! Thanks for your quick reply.

Unfortunately, I won't have much time in next several days, but I will get back to this PR asap.

No problem! It's not hurry. Whenever you have enough time.

ap4y commented 9 years ago

Hey @uk-ar I reviewed this PR. From code perspective it looks ok apart from small things like some tokens (class, func) should not be a part of prec grammars (which is operator based afaik) imo. That's being said, I don't think this approach is semantically correct. I'm not a smie master, but not having grammars (i.e. relaying on simplistic default rules) doesn't sound like a future proof approach. While I admit that it works, I think it will result in heavier indentation rules (which is slightly visible in this PR).

I want to try to fix issues you mentioned in this PR based on formal grammars. I hope to come up with something in next several days, so we could compare different approaches a bit more carefully.

ap4y commented 9 years ago

Hey @uk-ar, sorry for the delay. It took a bit more time to finish, luckly I fixed some other bugs and simplified smie implementation. My changes are in the simplify-smie branch. Keen to what you think about it. I tried going with your implementation, but it results in invalid sexp movements for most of the statements (due to the lack of grammars).

The only thing I couldn't resolve completely is a blink parens. There are still some cases in my implementation that fail blinking, specifically trailing closure blocks and simple code blocks. I can see possible solutions for this cases (most likely those block will need a separate tokenization rules in lexer), but it will take some time.

uk-ar commented 9 years ago

Hi @ap4y! Thank you for trying my implementation.

But, the simplify-smie branch have another degradation in backward-sexp(#97). For example, following code signals scan-error.

(progn
  (save-excursion (insert "let a\nlet b"))
  (forward-sexp 2)
  (forward-sexp -2)
)

I think this error is checked in test code, but I don't know why the test passed.

uk-ar commented 9 years ago

Please let me add more information for the sexp movements problem you pointed out. There is two types of sexp movements, in Emacs major modes.

1. Skips whole declalation(e.g. ruby-mode)

Forward-sexp skips whole declalation. If you execute forward-sexp beginning of "class" keyword, the cursor moves to end of class declalation.

class foo
end|

1. Skips only brackets(e.g. c++-mode)

Forward-sexp skips only brakets. If you execute forward-sexp beginning of "class" keyword, the cursor moves to end of class keyword.

class| foo {
}

Because I think swift syntax is similar to c++, so I implemented type 2. But you think type 1 is better for swift-mode, right?

ap4y commented 9 years ago

Thanks for checking my code, I'll check the degradation you pointed out. Your description of the situation with sexp is absolutly correct. I'm not even sure what is semantically correct way of defining sexp for swift (although as it obvious I lean towards ruby-like definition), some things are bothering me with brackets-only approach:

With this approach whole language can be reduced to something like: any language construction is an expression followed by optional code block. This is roughly what is expressed in your smie grammars. In reality it doesn't represent actual language, both swift and c++ have a formal BNF grammars defined for the language. There is a problem of course with those grammars, they are not easily expressed in smie because of the consecutive non-terminals limitation;
With this approach since language is reduced to exp + code-block semantics we are loosing access to the parent tokens in grammars. I think this is can be extremely inconvinient for some of the indentation rules, because it's hard to get information about outer scope. I think this will result in a large code in indentation rules. On the other hand ruby-like approach may result in a more complex lexer implementation;
I wanted to bring smie implementation of the pascal-like language from the docs as a counter example. begin..end blocks in this case are not much different from the brackets, example:

if x > 0 then begin
  doSomething();
end else if x < 0 then begin
  doSomethingElse();
end else begin
  foo();
end

As it seems from the docs semantically correct way of defining sexp for it will be ruby-like way, and I don't think that moving by begin..end blocks is the best way. Grammar for statement in docs for such language is defined like this:

("begin" insts "end")
("if" exp "then" inst "else" inst)

Similar grammar for swift will be ideal scenario imo, but I couldn't apply this idea into swift after several attempts.

I see a different problem with my implementation:

Not handling brackets manually in lexer and relying on syntax table breaks grammar indentation (although preserve grammar semantics like parent tokens). I didn't get too deep, tried to debug smie implementation, but sexp movements seem to be broken in this case;
When handling brackets manually in lexer (as it is right now), smie still fallback to syntax table in some cases and this cases break paren blinking. For example if in lexer instead of {} we will provide begin/end tokens, smie in indentation rules will be still asking for :close-all . "}". Seems like a bug to me to be honest.

At the end it seems both ways are not ideal, I wanted to know what others think about this situation. I think it makes sense to contact maintaner of the smie to know which way is semantically correct.

uk-ar commented 9 years ago

I think it makes sense to contact maintaner of the smie to know which way is semantically correct.

I agree to contact maintaner of the smie.

But I found the cause of the paren blinking problem. Smie treats "{" as neither token when grammar is defined like this: ("class" exps "{" insts "}") To enable paren blinking, we should define "{" as opener like this: ("{" insts "}")

I think the ruby-mode's "do" handling approach is very useful. I implemented this roughly in 799ee33. Please tell me your impression.

ap4y commented 9 years ago

I checked it, it works similar to ruby-mode (or what I have in my branch). The only concern I have is:

If you enable debugging output and try to indent this:

class Foo {
    |bar
}

you will notice that smie sees this code in a weird way:

forward: 23 -> 26 = bar
backward: 11 -> 7 = Foo
backward: 7 -> 1 = class
backward: 23 -> 7 = Foo
backward: 11 -> 7 = Foo
backward: 7 -> 1 = class
backward: 23 -> 7 = Foo
backward: 7 -> 1 = class [3 times]
:list-intro 'class'; sibling-p:nil parent:nil hanging:nil == nil
forward: 7 -> 10 = Foo
:elem 'args'; sibling-p:nil parent:nil hanging:nil == nil
forward: 7 -> 10 = Foo
:elem 'basic'; sibling-p:nil parent:nil hanging:nil == 4

And it will be indented incorrectly, although it should be possible to fix this. I also don't like grammar in this form: ("class" inst "}"). I think it will be confusing for other developers to figure out why we've done that, since it's a not valid swift grammar.

I also spent some time with c++ mode, and potential definition of the sexp for the swift. My conclusion was that c++ mode doesn't define sexp in a valid way, for example:

int| main() {
    return 0;
}

If you do mark-sexp for this example, only main will be selected, which is not what will be technically a minimal expression in this situation imo. It doesn't work well with something like easy-kill because of the short definition of the sexp. On the other hand I like sexp movements, they seem to be more precise. Wonder if there is middle ground somewhere. I will try to do more research and will try to mail to the smie maintainer.

uk-ar commented 9 years ago

Thanks for checking my code.

I also don't like grammar in this form: ("class" inst "}"). I think it will be confusing for other developers to figure out why we've done that, since it's a not valid swift grammar.

I agree that it looks strange for swift-mode developers.But ruby-mode defined "while exp do insts end" statement as ("while" insts "end") even if it is incorrect for ruby's BNF. I think it is limitation of smie. https://github.com/emacs-mirror/emacs/blob/master/lisp/progmodes/ruby-mode.el#L379

I don't know other way to resolve this. And I think we should contact maintaner of the smie. If you post this problem to smie maintaner (via emacs-devel?), please add the thread url information in this PR.

ap4y commented 9 years ago

Yep, I noticed this in ruby-mode, will think about this idea a bit more. I started a new thread in emacs-devel. I tried to describe our situation in details, but you might want to add something there.

uk-ar commented 9 years ago

Thank you for telling me the thread information. I'm not good at English, but I will follow you as much as I can.

ap4y commented 9 years ago

I think your english is quite good! But yeah, already good amount of information in that thread and I plan to write a response with some examples today.

ap4y commented 9 years ago

I think discussion in mailing list is more or less over, so I wanted to make an overview of what I have learned from it. First of all there is no semantically correct way of applying smie to the C-like language at this point (which I should have realised earlier). This means we have to try and there will be drawbacks between different approaches (lexer heavy or indentation rules heavy). I opened 2 tickets:

21875 - seems like a lexer issue, I suspect might be related to the paren blinking issue we have;
21896 - enhancement rather than a bug, but there is still an ambigious behavior for braces.

I think most logical step right now is to stop treating { as neither token in grammar, this should solve paren blinking issue. This might result in slightly weird grammar forms. I see there are 2 ways to do that:

Like in smc-mode by Stefan Monnier;
The ruby-mode way, as @uk-ar did.

I'm going to try first approach this week.

taku0 commented 8 years ago

Closing in favor of the new indentation logic.

swift-emacs / swift-mode

Simplify the grammar #117