tajmone / Sublime-PML

Sublime Text 4 syntax for PML (Practical Markup Language)
https://tajmone.github.io/Sublime-PML/
MIT License
4 stars 0 forks source link

Attributes Lines Continuation after Node Tag #26

Closed tajmone closed 2 years ago

tajmone commented 3 years ago

Currently, the attributes context allows a line-continuation \ before any attributes were found,e .g.:

[ch \
    title = Inline Fonts

which is incorrect, for in real case usage the \ can only follow an attribute, not the the node tag.

I should find a way to fix this. The problem is that the current solution makes it easier to track line-continuation slashes \ between an attribute and the next one:

  attributes:
    - include: line_continuation
    - include: pop-at-TagEnd
    - include: pop-at-EOL
    - include: attr_date
    - include: attr_time
    - include: attr_id

because matched attributes contexts, when they find the non-consuming lookahead (?=\s*\\$) they pop back to their attributes parent context, which is the one that will actually match and scope the \.

pml-lang commented 3 years ago

In PML 2.0 (based on the pXML syntax) attributes are surrounded by parenthesis, e.g. ( a1=v1 a2=v2 )

Whitespace (spaces, tabs and new lines) between attributes is allowed. Therefore the line continuation character \ is no more necessary, and will probably no more be supported. Attributes on multiple lines can be written like this:

[tag (
    a1 = v1
    a2 = v2
    a3 = v3 )
    ...
]

However, in lenient parsing mode, parenthesis will sometimes be optional (e.g. a node with only attributes, no child-nodes).

The exact parsing rules will be documented later, once the 2.0 syntax is 'final'. So maybe it's best to wait for version 2.0. before doing any changes.

tajmone commented 3 years ago

The new system is probably going to be easier to handle in editor syntaxes.

Unlike syntax highlighters' lang definitions, stack and RegEx based editor syntaxes can be more trick to implement for they have to account for all sorts of small edge cases and quirks. Code highlighters are fed static documents, so they can carry out bulk operations at once, whereas editors are more complicated due to the fact that the source is being edited by the user, on the one hand, and because of the need to provide semantic scoping in order to allow context aware completions, etc.

Probably code highlighters syntaxes are closer to the actual converter's parser, which is also fed a static document.

Syntaxes like this one for ST, can quickly and easily become entangled — all it takes is a bad implementation choice. The reason why the package started off rapidly, and now its development has slowed down is because the pending nodes pose some challenges for which I haven't yet made up my mind on how to handle them.

E.g., Lists ... these are theoretically simple: an [el node is only valid inside a [list node context, and lists can be nested. But in practice, there's always the risk that an [insert file node might be found inside a list, pointing to an external file were the root [list node is closed. If this happens, the ST syntax will never exit from the list context, and the rest of the document might break up (although valid PML). The same applies to many other nodes, which could all have their closing tag in an external file.

Although one would assume that best practices dictate against such use of the [insert file node, there could be many contexts in which chunking up long documents makes sense, due to alternative version of a document being created conditionally (e.g. a sample edition vs full edition), or different documents sharing some common chunks of text (notes, appendices, bibliographies, tables, charts, etc.).

It's always a delicate balance deciding how to implement features in this respect — deeper semantics offer smarter context-aware dynamic features (auto-completions, and even more via custom plug-ins), but the price is the risk of entanglement. Simpler semantics are safer, but offer a flat user experience (same completions suggested everywhere, etc.), which is not really cool.

In PML 2.0 (based on the pXML syntax) attributes are surrounded by parenthesis

Although parenthesis will make it easier to parse multiple attributes, if the user forgets to close them (or accidentally deletes the closing parenthesis) the syntax-Stack will end up looping in the attributes context until the first ) encountered — which is most likely going to break the rest of the document.

Unfortunately, without a real parser, it's impossible to enforce error tolerance on the syntax. And even if we could, how could we handle recovery? ST syntax don't expose variables, functions, or the like, so the most we could do is pop back to the parent context, blindly.

However, in lenient parsing mode, parenthesis will sometimes be optional (e.g. a node with only attributes, no child-nodes).

Let's hope it won't be hard to handle.

I'm afraid that for editors with simpler language syntaxes (e.g. Vim, Notepad++, etc., which boil down to a few RegExs definitions and/or lists of predefined elements) PML is going to be hard to support. The same applies to code highlighter without a Stack, e.g. as found in applications like diffing tools, etc.

I also suspect that creating a PML syntax for VSCode is going to prove challenging; partly due to intrinsic limits in its syntax definitions (as compared to ST4), partly due to the lack of a automated test system (ST's syntax tester is easy to use, and very precise).

pml-lang commented 3 years ago

Syntaxes like this one for ST, can quickly and easily become entangled

Yes. Some PML syntax rules can only be implemented correctly with a real parser. A good example is the insert node you mentioned. Maybe the best approach is to keep it simple and provide a plugin that is reasonably simple to implement and maintain, and works well in most (but not all) cases. The challenge is to find the right balance.

Unfortunately, without a real parser, it's impossible to enforce error tolerance on the syntax.

Even with a parser it is very challenging to provide error tolerance. IMO for some errors it is best to just report the error and abandon further parsing, because of the high risk for false positives.

tajmone commented 3 years ago

IMO for some errors it is best to just report the error and abandon further parsing, because of the high risk for false positives.

The problem with this approach is that the whole highlighting of the document breaks down from that point on. This has been a major problem with the AsciiDoc package for Sublime Text, for example. Once the syntax breaks up, manybuilt-in editor features (as well as third party plug-ins) stop working, since they rely on scope semantics to work — e.g. symbols navigation (including chapters/sections titles), completions, etc.

Scoped semantics are a double edged-sword. If on the one hand they offer fine-grain control over editing features (like showing completions depending on the context surrounding the caret/selection), on the other hand the more granular these are, the worst they break up when syntax parsing goes wrong.

Obviously, if syntax parsing goes wrong due to bad implementation of the syntax definition, things are really bad since well formed documents could break up. But there's also the case of a document being malformed due to incomplete editing being under progress, which is not all that bad since the document will recover once editing is done with.