kstenerud / concise-encoding

The secure data format for a modern world
https://concise-encoding.org
Other
258 stars 4 forks source link

TODO "Figure out how to do verbatim sequences in ANTLR" is impossible to implement with ANTLR #39

Open ST92 opened 2 years ago

ST92 commented 2 years ago

I spent last few days digging around why would such an innocently looking feature be left as a TODO. I was looking for something to put my hands into, and it looked promising enough.

Turns out ANTLR doesn't support back-references or forward-references at all. There is no good way to do it using only it.

A known workaround is to embed actions that verify whether the two tokens match (analogous to how XML opening-closing tag pair match tagname), but those actions involve putting a piece of code inside the grammar lexer definition in a programming language that matches the language of the generated lexer.

That would mean putting Java code in concise-encoding grammar definitions, and thus tying it tightly to Java.

I want to write a Rust 100% implementation. AFAIK at this moment I need to write a custom lexer and parser to make verbatim escape sequences work.

TLDR; ANTLR is insufficient, because VES grammar is context-sensitive

ST92 commented 2 years ago

On a positive note, the spec is detailed enough, such that wrong grammar files don't impact anything really. Honestly I'm a bit disappointed that ANTLR seems the best tool for the job but is very much lacking.

kstenerud commented 2 years ago

Yeah, I was hoping to rig something up with a templating engine (in python or whatever) to generate a finalized grammar file with stub code for whatever language is being built. In theory the actual verbatim sequence code itself is simple since you're just reading termination token data until the next whitespace, then reading content data until you encounter the termination token again.

kstenerud commented 2 years ago

BTW please do write up anything that you find confusing or weird in the spec. If it's confusing, it's badly written!