hsutter / cppfront

A personal experimental C++ Syntax 2 -> Syntax 1 compiler
Other
5.45k stars 239 forks source link

[SUGGESTION] Treat a keyword as an identifier #392

Closed msadeqhe closed 1 year ago

msadeqhe commented 1 year ago

Preface

I suggest to use "keyword"$ syntax to treat keywords as identifiers in Cpp2.

Suggestion Detail

Some identifiers in Cpp2 are keywords in Cpp1 such as and, or, etc. These identifiers aren't valid identifiers in Cpp1. Therefore Cpp2 appends cpp2_ prefix to these identifiers during code generation for Cpp1 as discussed in this issue. For example identifiers and, or, etc in Cpp2 will respectively become identifiers cpp2_and, cpp2_or, etc in generated Cpp1 code.

On the other hand, some keywords in Cpp2 aren't keywords in Cpp1 such as type, next, etc. These keywords are valid identifiers in Cpp1. Therefore it's not possible to use such identifiers in Cpp2 when dealing with Cpp1 API:

// ERROR! `type` is a keyword, whereas it's an identifier in Cpp1.
v0: = type.name();

// ERROR! `next` is a keyword, whereas it's an identifier in Cpp1.
while next < 10 next next++ {
    //: statements...
}

I suggest to use a syntax such as "keyword"$ to access identifiers from Cpp1 API in which they are keywords in Cpp2:

// OK. `type` is an identifier.
v0: = "type"$.name();

// OK. `next` is an identifier.
while "next"$ < 10 next "next"$++ {
    //: statements...
}

I think syntax "keyword"$ is good enough because Cpp2 will have similar syntax for reflections and code generations as described in this page of Wiki. And semantically this is related to them.

Your Questions

Will your feature suggestion eliminate X% of security vulnerabilities of a given kind in current C++ code?

No.

Will your feature suggestion automate or eliminate X% of current C++ guidance literature?

No.

Considered Alternatives

I considered to use syntax @keyword but it would resemble meta-class functions. Another possible syntax is @"keyword", which doesn't resemble meta-class functions in addition to its difference from capture syntax.

Also we can use a prefix such as cpp1_. For example:

// OK. `cpp1_type` will become `type` in Cpp1.
v0: = cpp1_type.name();

// OK. `cpp1_next` will become `next` in Cpp1.
while cpp1_next < 10 next cpp1_next++ {
    //: statements...
}

In this way, cpp1_ prefix in addition to cpp2_ prefix should be reserved for Cpp2 compiler, and user-defined identifiers with those prefixes shouldn't be allowed.

By the way, treating a keyword as an identifier, isn't frequency needed to dedicate a new symbol to it.

EDIT 1: keyword<> is another alternative solution as described in this comment.

JohelEGP commented 1 year ago

Therefore it's not possible to use such identifiers in Cpp2 when dealing with Cpp1 API

I've actually found the Cpp2-specific keywords to be contextual, except for the <cstdint> shorthands (e.g., i8), and inspect. So this works: https://godbolt.org/z/vYPf6MG6h.

int next = 0; // Cpp1.
f: () = {
    while next < 10 next next++ {
    //: statements...
    }
    while 10 < next next next++ {
    //: statements...
    }
}
main: () = {
    :(is) = { [[assert: (is)is(is)]] }(:() = {});
}

Including the :(is) = { [[assert: (is)is(is)]] }(:() = {});.

msadeqhe commented 1 year ago

You're right :grimacing:, that's very nice but it doesn't work in every situation, for example:

type: type = {}

main: () = {
    variable: type = 0;
}
JohelEGP commented 1 year ago

I prefer cpp2_type over "type"$ to lower to Cpp1 as type.

msadeqhe commented 1 year ago

Also Cpp2 may have more keywords during language evolution in the future, therefore keywords such as result in this issue are already valid identifiers in Cpp1.

JohelEGP commented 1 year ago

For variable: type = (), you want type to be a Cpp1 identifier. In this case, it clashes with Cpp2 syntax itself. It can be solved with an indirection: std::type_identity_t<type>: https://godbolt.org/z/TGoqsebrW.

JohelEGP commented 1 year ago

This are the sets of Cpp2-specific keywords:

{i8,       i16,       i32,       i64,       u8,
 u16,      u32,       u64,       i8_fast,   i16_fast,
 i32_fast, i64_fast,  u8_fast,   u16_fast,  u32_fast,
 u64_fast, i8_least,  i16_least, i32_least, i64_least,
 u8_least, u16_least, u32_least, u64_least, inspect});
{next, copy, move, forward, pre, post, final,
 in, 
 as, is, type, 
 assert, throws,  
 implicit, out,   
 inout});            

I can't come up with another example like https://github.com/hsutter/cppfront/issues/392#issuecomment-1519860597's. A simpler alternative for variable: type = () is qualifying type: variable: ::type = () https://godbolt.org/z/W3v48se5K.

msadeqhe commented 1 year ago

Thanks. It's a good idea to use helper type std::type_identity_t<type> or write the qualified name ::type. But ::type doesn't work for local types:

main: () = {
    type: type = {}
    variable: ::type = ();
}
JohelEGP commented 1 year ago

Well, the only reason to name it type in that case, would be for some Cpp1 library that reflects the identifer, or a wicked Cpp2 one.

MichaelCook commented 1 year ago

Perhaps a backslash prefix would be a little more readable.


v0: = \type.name();
while \next < 10 next \next++ {
    //: statements...
}```
JohelEGP commented 1 year ago

Makes me think if raw string literals could just be backslash-escaped (see #302).

msadeqhe commented 1 year ago

@MichaelCook, I like the idea to use backslash syntax \keyword.

Escape sequences in string literals start with a backslash to change the meaning of a character, why not just use it to change the meaning of a keyword and treat it as an identifier?

@JohelEGP, Good idea but in my opinion, raw (non-interpolated) string literals break the general capture syntax (thing)$ everywhere in the language. Also string literals without prefix or suffix will make it possible to have operator'' and operator"" or Tagged Template Strings or any other versatile syntax in Cpp2.

msadeqhe commented 1 year ago

@MichaelCook, I like the idea to use backslash syntax \keyword.

Because it only works on keywords, if Cpp2 has binary operator \, it wouldn't conflict with it:

// Treats `type` keyword as identifier.
x: \type = ();

// binary operator \
y: = var0 \ var1;

Also unary prefix and postfix operators \ wouldn't conflict with \keyword because they would be applied to identifiers.

msadeqhe commented 1 year ago

These are other use cases and reasons why it does matter in addition to C and Cpp1 language interop:

jcanizales commented 1 year ago

Therefore it's not possible to use such identifiers in Cpp2 when dealing with Cpp1 API

Could you give an example C++ API that would not be usable from Cpp2? Naming variables is not such a case.

MichaelCook commented 1 year ago

How's this:

struct myapi_t {
    int type;
};
JohelEGP commented 1 year ago

See https://github.com/hsutter/cppfront/issues/392#issuecomment-1519838852. https://godbolt.org/z/GoK86hn53:

struct myapi_t {
    int type;
};
main: () -> int = myapi_t().type;
msadeqhe commented 1 year ago

Also Cpp1 type traits in standard library have a member type named type. For example:

std::is_integral<T>::type
msadeqhe commented 1 year ago

Also const doesn't work as a type name, although both Cpp1 and Cpp2 have const keyword:

const: type = {}

main: () = {
    x: const = "";
}

But it works with qualified name. By the way Cpp2 generates invalid Cpp1 code:

const: type = {}

main: () = {
    x: ::const = "";
}
msadeqhe commented 1 year ago

According to @JohelEGP's helpful comment:

... I can't come up with another example like #392 (comment)'s. A simpler alternative for variable: type = () is qualifying type: variable: ::type = () ...

If Cpp2 accepts all keywords as identifiers with qualified names, and if Cpp2 would allow a way to have qualified names for local declarations (e.g. _::identifier), and if Cpp2 could treat keywords as identifiers in expressions where there is an operator, I think it's going to solve the issue, because it's a general rule and easy to understand/follow:

// `while` is not a keyword, because it's left-hand-side of `:`.
while: type = {
    // `return` is not a keyword, because it's left-hand-side of `:`.
    return: () = {}
}

main: () = {
    // `do` is not a keyword, because it's left-hand-side of `:`.
    // `do` is a local type.
    do: type = { /* definition */ }
    // `_::do` is a qualified name.
    variable: _::do = /* definition */

    // `for` is not a keyword, because it's left-hand-side of `:`.
    // And `while` is an identifier.
    for: ::while = ();
    // `for` is an identifier becuase there is an operator dot.
    // And `return` is an identifier too.
    for.return();

    // Also to clarify, `for` may be required to be quialified, but it's a little restrict.
    _::for.return();

    // `for` is a keyword, because there isn't any operator between `for` and `args`.
    for args do (arg) { /* statements */ }

    // `return` is not a keyword, because it's left-hand-side of `:`.
    return: (param) -> ::while = ();

    // AMBIGUOUS!
    // When there is an ambiguous, they are keywords.
    // So `return` is a keyword here.
    return(/* something */);

    // This `return` is not a keyword, because its name is qualified.
    _::return(/* something */);

    // `if` and `for` are not keywords, because there is an operator between them.
    value: = if * for;

    // `do` is not a keyword, because it's already a qualified name.
    my_namespace::do();
}

In this way, keywords would be fully contextual (except for operator() becuase it means both for function call and grouping), and any new syntax won't be required to treat keywords as identifiers.

EDIT

This way has a problem if Cpp2 supports operator overloading for '' and "" in the future (of course perhaps Cpp2 never supports them but it's a possibility to consider):

// This is AMBIGUOUS:
//   Is `return` an object with `operator""`?
//   Or is `return` a keyword and returns string `"something"`?
return "something";

// This is OK. `return` is an object with `operator""`.
_::return "something";

It is similar to how operator() is ambiguous.

msadeqhe commented 1 year ago

To simplify the rule, maybe Cpp2 should require that every identifier to be qualified in which their name are equal to keywords:

// `_::` must be omitted for naming declarations.
while: type = /* definition */
do: _::while = /* definition */

// `_::` must be omitted for accessing members with :: or dot.
// Because these are already qualified names.
_::do::type.as.forward;

_::return(args);

// OK. Keywords cannot be qualified names.
value: = _::if * _::for;

// `inspect` is not a keyword, because it's already a qualified name.
my_namespace::inspect();

So they must be always qualified either namespace::keyword for identifiers inside a namespace or ::keyword for global identifiers or _::keyword for local identifiers (if Cpp2 supports something similar).

Instead of "keyword"$ or @"keyword" or \keyword, it's going to be namespace::keyword or ::keyword or _::keyword. Those are alternative ways to achieve the same thing...

msadeqhe commented 1 year ago

Instead of "keyword"$ or @"keyword" or \keyword, it's going to be namespace::keyword or ::keyword or _::keyword. Those are alternative ways to achieve the same thing...

Another option to consider is keyword<>.

Types, functions and variables can be templates already. So keyword<> would mean a type, function or variable.

if: (condition: bool, forward yes_value: int, forward no_value: int) -> forward int = {
/* Or force the template notation <> within declaration:
if: <> (condition: bool, forward yes_value: int, forward no_value: int) -> forward int = {
*/
    if condition {
        return yes_value;
    }
    else {
        return no_value;
    }
}

if: <T> (condition: bool, forward yes_value: T, forward no_value: T) -> forward T = {
    if condition {
        return yes_value;
    }
    else {
        return no_value;
    }
}

main: () = {
    // if<> is not a keyword here.
    x: = if<>(2 * 2 == 4, 1, 0);

    // if<bool> is not a keyword here.
    y: = if<bool>(2 * 2 == 4, true, false);

    // if<> is not a keyword here.
    z: = (2 * 2 == 4).if<>(1, 0);
}

But keywords cannot be templates, so template arguments after keywords, mean they are not keywords.

This requires empty template argument list as described in this bug.

msadeqhe commented 1 year ago

But keywords cannot be templates, so template arguments after keywords, mean they are not keywords.

Also it depends if Cpp2 would have similar Cpp1-style static_cast<X>(arg), dynamic_cast<X>(arg) and etc (keywords which look like templates) or not.

Or would Cpp2 have stand-alone <something> as an expression (without identifier before it)?

if <something> { ... }

That's suggested in this issue.