icerpc / slicec

The Slice compiler library
Apache License 2.0
13 stars 5 forks source link

Improve Escape Sequences in String Literals #659

Closed InsertCreativityHere closed 11 months ago

InsertCreativityHere commented 11 months ago

This PR fixes #658 and our escaping logic in general. Now, whenever you type \X in a string literal, it will always be mapped to X. \\ -> \ \" -> " \n -> n \8 -> 8 Slice performs no additional processing, so we don't support special escape sequences like \n or \uxxxx... But if a user wants to pass these through to their mapping, they still can with something like \\n.

pepone commented 11 months ago

How would I pass the string/path c:\new-folder? I understand this should be c:\\new-folder but then how can I pass a \n new line character?

InsertCreativityHere commented 11 months ago

All that Slice does is map c:\\new-folder to c:\new-folder It's up the receiver of the string to decide what the meaning of \n is.

If you're writing an argument for cs::type, the string will end up in C#, which would see this string as being:

c:
ew-folder

If you're writing an argument that gets passed to a windows command prompt (somehow), it would see C:\new-folder, since it attaches no special meaning to \n


Note that newline characters are explicitly disallowed in string literals. Since string literals are limited to attribute arguments, I don't think the need for Slice to handle special characters will be high. And if you really want a form-feed character or something, you can always directly paste the codepoint into your string.

InsertCreativityHere commented 11 months ago

@externl @ReeceHumphreys

I wrote some other implementations. Let me know which you prefer I guess.

(1) Using an accumulated string instead of filtering in place:

s.chars().fold((String::new(), false), |(mut string, is_escaped), c| {
    if is_escaped || c != '\\' {
        // If this isn't an escape character (it's already escaped, or not a backslash), add it to the string,
        // and ensure the `is_escaped` flag is cleared to false.
        string.push(c);
        (string, false)
    } else {
        // Otherwise, this is an escape character. We don't add it to the string,
        // and set `is_escaped` to true, so we know the next character is being escaped.
        (string, true)
    }
})

(2) Extra 'is_escape_character' variable

// Flag that stores whether the next character we read is being escaped.
let mut is_escaped = false;
s.chars()
    .filter(|c| {
        // If `c` is a backslash, and it isn't already escaped (ie: "\\"), then it is an escape character.
        let is_escape_character = *c == '\\' && !is_escaped;
        // Set `is_escaped` accordingly, so we know if the next character is being escaped.
        is_escaped = is_escape_character;

        // Return false for escape characters to filter them out of the string.
        !is_escape_character
    })
    .collect()

(3) Extra 'is_backslash_variable':

// Flag that stores whether the next character we read is being escaped.
let mut is_escaped = false;
s.chars()
    .filter(|c| {
        // If `c` is a backslash, and it isn't already escaped (ie two backslashes "\\"), ...
        let is_backslash = *c == '\\';
        // set that the next character is being escaped.
        is_escaped = is_backslash && !is_escaped;

        // Return false to filter out escape characters.
        !is_escaped
    })
    .collect()
externl commented 11 months ago

Let's just keep the original.

bernardnormier commented 11 months ago

Is there an agreement on the behavior? Unless there is, opening a PR seems premature.

InsertCreativityHere commented 11 months ago

Is there an agreement on the behavior?

I had thought there was, but I guess we'll find out now.

New Proposal

1) We keep the behavior and logic as-is. 2) But when we encounter an escape sequence other than \\ or \", we emit a warning saying:

warning: unknown escape sequence '<THE SEQUENCE>'
note: To type a literal backslash character, you must escape it using '\\'
externl commented 11 months ago

For the sake of time, I'd rather just merge this PR like it is. It has "Node.js" behavior. It's unlikely anyone is every going to run into this.

InsertCreativityHere commented 11 months ago

Okay, I agree, I'd really rather not spend too much time on an edge case in a niche. Honestly, the main point of this was to fix the bug: #658.

I'll open an issue where we can discuss further changing how Slice handles escape sequences.