titzer / virgil

A fast and lightweight native programming language
1.22k stars 42 forks source link

Escaping apostrophes in string literals #87

Closed srackham closed 2 years ago

srackham commented 2 years ago

Why do apostrophes have to be escaped in double-quoted string literals? It seems unnecessary and it's very surprising.

For example, this literal: "It's a beautiful day" generates this compiler error: ParseError: invalid string literal.

titzer commented 2 years ago

I'd be OK lifting that restriction. It was just a little simpler to share that nasty little routine in the parser.

If you want to take a crack at that, the parser is in aeneas/src/vst/Parser.v3 and there is similar code in lib/util/Chars.v3. Tests will have to be adjusted. For that you just do cd test; ./configure; aeneas test and see what starts failing.

srackham commented 2 years ago

I've lifted the restriction and updated the tests. The updated and new lib tests pass running test/lib/test.bash stand-alone, but running them via aeneas test fails because aeneas uses the stable compiler and not the current compiler. I worked around this by setting the V3C_STABLE environment variable to the current compiler:

V3C_STABLE=$VIRGIL_LOC/bin/current/x86-linux/Aeneas aeneas test

As it stands the changes can't be committed because aeneas test will fail. Is there a recommended workflow for bootstrapping in changes that break existing tests? The Bootstrapping.md document is pretty terse :-)

titzer commented 2 years ago

Thanks for doing this. Working in a language change like this is indeed a careful bootstrapping process, which is why I was planning on filling this documentation out on how it's done :-).

Normally an additive change (i.e. accepting more programs with a new feature) just requires waiting a stable cycle (i.e. a checkin of new stable binaries) before using the feature in either the compiler or lib/util.

A breaking change (e.g. disallowing some currently-legal programs or changing runtime behavior of a feature) is normally done with a feature flag that enables the new functionality, then adding test/ subdirectories that explicit set/unset the flag (for example, -legacy-cast. Then multiple stable cycles are required; first to add the flag and behavior and get it into stable, then to update the compiler and libs to work with either setting of the flag, then flip the default value of the flag, then later remove the flag. It's possible to combine some of those steps, but can be dangerous, as a screwup can break the chain of bootstrapping stable binaries. Also, the longer the flag is supported, the farther back in time the current compiler can compile old versions of itself.

I think for this one, as long as your modifications don't introduce previously illegal strings into either the compiler or lib source, you'll be fine. I.e. the changes should only add support in parsing escapes (and probably not yet in rendering them).

srackham commented 2 years ago

Ok, there are two ways to do this (neither break existing source compatibility):

  1. Delete this line:

https://github.com/titzer/virgil/blob/7404b8924aac3d423412cf8a9705586ab391c655/lib/util/Strings.v3#L118

This invalidates one negative test:

https://github.com/titzer/virgil/blob/7404b8924aac3d423412cf8a9705586ab391c655/test/lib/StringsTest.v3#L65

which has to be rewritten err(-2, "\"\'");

  1. Change the line to:
    if (ch == '\'' && i == max-1) return (pos - i, null);

This is not as clean since the only purpose of the line is to maintain backward test compatibility.

What's your preference?

titzer commented 2 years ago

Option 1 is definitely cleaner.

titzer commented 2 years ago

This was fixed with #89.