hvesalai / emacs-scala-mode

The definitive scala-mode for emacs
http://ensime.org
GNU General Public License v3.0
362 stars 68 forks source link

Syntax highlighting of multi-line string literals breaks with nested ${} for string interpolation #131

Open kocubinski opened 7 years ago

kocubinski commented 7 years ago

Should be pretty obvious from the images what's going on:

What should happen (IntelliJ) image

What happens (Emacs) image

When a ${ } block is used with string interpolation and """ string literal, syntax highlighting breaks.

I looked a the string literal regexes in scala-mode-syntax.el around line 93, and it was too much for me. Hoping some regex wizard can help.

fommil commented 7 years ago

Writing long expressions like this in strings seems like a terrible idea to me and I'm surprised the compiler tolerates it. I'm not sure I'm particularly keen on complicating the scala-mode code to support it.

hvesalai commented 7 years ago

In this case the problem is really the nested string, i.e. anything of form """${"""x"""}""".

As you might know, regular expressions (i.e. the regular language, or Type-3 languages of the Chomsky hierarchy), cannot express recursion that would be needed to model nested strings. This can be understood from the fact that regular expressions are implemented as finite-state machines. As these machines have only finite number of states, they have no way of keeping track of the (possibly infinite) recursion.

If we would want to support nested strings, we would need at least a Type-2 language to model them. An implementation of this would be an LL parser. As these languages are realized by at least a pushdown automaton, they have the ability to keep track of the recursion.

aij commented 7 years ago

@hvesalai In practice, it's almost certainly sufficient to support finite recursion though, as emacs doesn't support infinitely sized files. Of course, no specific finite level of recursion can be shown to be enough, and I'll admit I'm surprised it needs to be > 0.

Of course, complicating the regexes may very well break other things, especially given that scala-mode is expected to provide reasonable syntax hilighting even in the presence of syntax errors.

fommil commented 7 years ago

(I also hope everybody is using parboiled2 or fastparse for their scala parsing needs instead of rolling their own!)