google / mangle

Apache License 2.0
1.07k stars 38 forks source link

Escape-handling in string literals #1

Closed burakemir closed 1 year ago

burakemir commented 1 year ago

Once string literals are parsed (by the ANTLR generated parser,) we need to process the string a bit before turning it into a string constant. This happens in the parse/parse.go VisitStr

This currently only does some basic processing of escaped quote characters.

When we turn strings constants into a string representation again in ast/at.go String(), it would be good if there was an String() method that produced output which could be parsed again.

String support was originally added for basic strings (think: words), but for a stable release we'd want to have proper support for escapes, and also support for multi-line strings (already in grammar).

cpendery commented 1 year ago

@burakemir Happy to try to contribute support for this

rkrishnasanka commented 1 year ago

I'm wondering there's a reason why this project is using the Vistor vs the Listener pattern. I've mainly used the listener pattern to date, and it was very easy to process inner-loops.

burakemir commented 1 year ago

Thanks for engaging on our first issue! There is a spec part and an implementation part.

I think we'd want escapes for the whitespace codes and unicode escapes, as in the CEL spec https://github.com/google/cel-spec/blob/master/doc/langdef.md#string-and-bytes-values

Multi-line strings come with their own twist. The Bigquery (GoogleSQL) spec does offer multi-line literals https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical

With proper support for \n and unicode escapes we already have enough to turn multi-line strings back into literals.

Regarding the use of visitors in ANTLR, I had replaced my hand-written parser with ANTLR in a hurry and the visitor way of using ANTLR was what I was familiar with. It may be worth to revisit, but I believe it would not make a difference for this particular issue, as we would want string literals to be tokens.

burakemir commented 1 year ago

Let me try convert this issue to a (freshly enabled) GitHub discussion.