microsoft / monaco-editor

A browser based code editor
https://microsoft.github.io/monaco-editor/
MIT License
40.68k stars 3.61k forks source link

Monarch indented block end #1564

Open Tronic opened 5 years ago

Tronic commented 5 years ago

A few languages use indented blocks for structure, and thus don't always have end markers that could be directly matched against. In particular, YAML presents multi-line string literals (scalars in YAML terminology) in this way. The problem is that neither TextMate nor Monarch appear to support matching the change of indent, and consequently YAML's handling is quite broken in most editors. (well, at least Github gets it right)

parent:
  textNode: |
    this is
    *** supposed to
    be:
    raw text
  anotherNode: foo

This could be solved primarily by

Unless I've overlooked something, the current best solution is generating about 40 duplicate rules with all typically used indentation widths in their begin/end regular expressions.

alexdima commented 4 years ago

Monarch was contributed initially by Daan Leijen and I am just maintaining it at this point, but it looks like our yaml Monarch parser manages to overcome the indentation change by doing these kind of tricks:

https://github.com/microsoft/monaco-languages/blob/0ed9a6c3e90a24375fab54f7205fb76ce992f117/src/yaml/yaml.ts#L159-L173

The multiStringContinued.$1 means that the state is entered and the matched text is passed in as an argument to that state, which is then available at $S2.

Tronic commented 4 years ago

TextMate format recently added something similar that is actually now used for YAML and Nim in VSCode. Still, a solution without such workaround would be preferred.

alexdima commented 4 years ago

@Tronic I also maintain vscode-textmate, which is the TextMate grammar engine that executes in VS Code and I'm not 100% sure, what has TextMate recently added that is now used in YAML in VS Code?

VS Code uses the YAML grammar from https://github.com/textmate/yaml.tmbundle and that hasn't had any commits since 2017

Tronic commented 4 years ago

Around the time when I reported this there was a discussion about it that I am no longer able to find.

IIRC, VSCode did not correctly highlight YAML text blocks but there already was a working implementation elsewhere and some time later this year VSCode was updated to handle it properly. Possibly VSCode until then used a module maintained by some other project that has now deleted itself.

In any case, the feature I was referring to was matching indentation with regex groups captured from the initial line, which is not mentioned in Textmate specs but appears to work in current implementation.

henriquetmm commented 4 years ago

being able to use a previously captured indentation whitespace string in the end regex or otherwise to determine whether the block needs to be popped

There is a way to do this, you can grab the state value that was passed with the indentation level you want (as @alexdima suggested) and use it as a case comparison on a match for leading whitespace. Something like this: [ /[ \t\r\n]+/, { cases: { '~$S2 *': { token: 'white' }, '@default': { token: '@rematch', next: '@popall' } } } ]

This still doesn't fix it but provides you with more flexibility to play around with indentation levels than directly comparing the previous indentation, like the current yaml implementation does. I hope this might help!

swathi545 commented 8 months ago

I am working on creating regular expressions to identify multi-line string which not enclosed with backticks or quotes. Need to identify the multi-line string based on indentation.

Ex: object objectname property1: propertyvalue1 source = **let Source = Sql.Database(S dbo_FactCurrencyR

"Removed Coluvhv

            in
                #"Renamedghvhv C
                    #"Renamed Columns"**
           property3: propertyvalue3

Referred, https://github.com/microsoft/monaco-languages/blob/0ed9a6c3e90a24375fab54f7205fb76ce992f117/src/yaml/yaml.ts#L159-L173 If all the lines are with same indentation, then only it is considering as a part of multi-line, but as per the example greater indentation lines are also part of multi-line string. How to achieve this?

image image

@alexdima , @henriquetmm