hsutter / cppfront

A personal experimental C++ Syntax 2 -> Syntax 1 compiler

Other

5.43k stars 236 forks source link

[SUGGESTION] Literal suffixes are constructors. #455

Closed msadeqhe closed 1 year ago

msadeqhe commented 1 year ago

1. Preface

Literal suffixes are syntactic sugars to constructors. Considering Unified Function Call Syntax for Member Functions and Non-member Functions, this suggestion is somehow something similar for Literal Suffixes and Constructors.

Briefly, I suggest to support this:

a: = 1'000ul * 5.0float;
b: = "text"s8.size();
c: = (2.5litre)water.weight();
d: = ((1my::box)apple + (2my::box)orange).weight();

name: = ask_player_name();
age: = ask_player_age();
p: = (name, age)my::player;
p.buy(()m4gun, 30bullet);

I have to explain:

ul and s8 are type aliases in 1'000ul and "text"s8.size() respectively.
(2.5litre)water creates a variable from litres constructor, and passes it to waters constructor.
(name, age)my::player creates a variable from type player with (name, age) constructor.
()m4gun creates a variable from type m4gun with default constructor.

2. Suggestion Detail

Currently, Cpp2 doesn't have a special syntax to directly call constructors. I suggest to directly call Constructors of a type in the form of Literal Suffixes. Let's name it Direct Object Construction Syntax or Constructor Call Syntax.

So ...TYPE will be a syntactic sugar to (: TYPE = ...) in Cpp2. For example:

//: = (: something = 2);
x0: = 2something;

//: = (: taip = ("text", 0));
x1: = ("text", 0)taip;

It requires to remove all built-in literal prefixes and suffixes:

Remove l, ll, ul, ull suffixes from integer literals.
Remove f, l, f8, f16, f32, f64 suffixes from floating-point literals.
Remove u, U, u8, u16, u32 prefixes from both character and string literals.
- Also removing R and $ prefixes from string literals as described in this issue, will make all literals to be consistent.

Constructors and UDLs (user-defined language literals) are two ways in Cpp1 to create objects from literals:

// -- It calls the constructor of `something`,
// -- therefore it needs parenthesis in Cpp1, otherwise it would be `something1` which is an identifier!
something(1)
// -- It doesn't call the constructor of `something`, because it's UDL.
1something

The following expression in Cpp2 satisfies the purpose of both two lines above:

// -- It calls the constructor of `something`,
// -- but it doesn't need parenthesis in Cpp2, because it looks like UDLs with somehow stronger behaviour.
1something

These are some notes to consider:

Type aliases make this suggestion, easier, simpler, readable and a replacement for UDLs. For example:
```
ul: type == ulong;
s8: type == std::u8string;
```

a: = 1'000ul; // -- It's equal to 1'000ul in Cpp1. b: = "text"s8; // -- It's equal to u8"text" in Cpp1.

2. They can be within namespaces, because they are types:
```haskell
// -- `box` is a type within namespace `my`.
x: = 2my::box;

Multiple constructors (aka UDLs) can be applied to literals. For example:
```
// -- `litre` and `water` are types.
c: = (2.5litre)water;
```
They can be applied to multiple literals, as they are arguments to call the constructor. For example:
```
// -- `player` is the type within namespace `my`.
p: = (name, age)my::player;
```
They can call default constructor with (). For example:
```
// -- `m4gun` is the type.
x: = ()m4gun;
```

They can be used with other operators. For example:


a: = 1'000ul * 5.0float;
b: = "text"s8.size();
c: = (2.5litre)water.weight();
d: = ((1my::box)apple + (2my::box)orange).weight();

name: = ask_player_name(); age: = ask_player_age(); p: = (name, age)my::player; p.buy(()m4gun, 30bullet);


If Cpp2 would have array literals as described in [this issue][2], a similar syntax would be available to call the constructor for them. For example in a consistent way with other literals, parentheses around `[...]` aren't necessary:

[2]: https://github.com/hsutter/cppfront/issues/424

```haskell
//: = (: std::vector<int> = [1, 2, 3]);
x0: = [1, 2, 3]std::vector<int>;

dict: <T> type = std::vector<std::pair<std::string, T>>;
//: = (: dict<int> = [("a", 1), ("b", 2)]);
y0: = [("a", 1), ("b", 2)]dict<int>;

Consider how ...TYPE is expressive and more readable than (: TYPE = ...), that's the reason why Cpp1 has UDLs. For example:

//: = (: point<int> = (1, 2)) * (: std::vector<int> = [1, 2, 3]);
ab: = (1, 2)point<int> * [1, 2, 3]std::vector<int>;

//: =(: box = ((: apple = 10) + (: orange = 20)));
mn: = (10apple + 20orange)box;

//: = player.buy((: apple = (: kg = 1)), (: health = (: box = 2)));
uv: = player.buy(2gun, (2box)health);

It's possible to consume Cpp1 UDLs. For example:

// -- `ms` is Cpp1 UDL.
//: = (: my::clock = (operator""ms(: ulonglong = 10)));
ab: = ((10ulonglong)ms)my::clock;

Constructors can replace Cpp1 UDLs completely, but Cpp2 can still support to author UDLs (user-defined literal suffixes, e.g. operator""suffix). Probably the plan is to only consume UDLs as described in this comment from @hsutter.

3. Your Questions

Will your feature suggestion eliminate X% of security vulnerabilities of a given kind in current C++ code?

No.

Will your feature suggestion automate or eliminate X% of current C++ guidance literature?

Yes, because this change makes Cpp2 to reduce the concept count with a general language feature. So it will be simpler to learn and understand which leads to smaller guidance literature.

It unifies constructors with UDLs. They are semantically the same. Both of them create a new object.
1. It's useful in generic programming.
2. It reduces concept count.
  - Novice programmers don't need to learn a distinct concept about UDLs.
  - All types benefit from UDL like syntax. It's not needed to declare UDL for them.
  - It eliminates the need of understanding and learning built-in prefixes and suffixes for literals.
3. The syntax of calling constructors will be expressive and readable.
It distincts constructors from regular function calls. They are semantically different.
- Constructors:
  - ...TYPE, parentheses are not necessary when ... is only one literal.
  - ()TYPE, it calls the default constructor
  - (args...)TYPE
- Regular Function Calls:
  - FUNCTION(), it calls a function without arguments
  - FUNCTION(args...)
  - obj.FUNCTION()
  - obj.FUNCTION(args...)
They can be chained together, whereas it's not possible with UDLs in Cpp1.
- Only one UDL can be applied to a literal in Cpp1.
Constructors already can be templated, but UDLs cannot be templated.
- UDL templates are not supported in Cpp1.
It removes built-in literal prefixes and suffixes. They are inconsistent and redundant.
1. They are visually inconsistent.
  - Some of them are prefix.
  - Some of them are suffix.
2. Their behaviours are inconsistent when the constant of literal exceeds the type as described in this comment.
The name to construct a literal and to declare a variable will be consistently the same.
- It's not needed to declare a new name for literal suffixes.
- The name of types are like a suffix that will construct an object.
They can be applied to literals with qualified name (if they are within namespaces) unlike UDLs which need using statement before they can be applied to literals.
- That's why UDLs in Cpp1 have to be prefixed with _, thus they will be distinguished from UDLs which are declared in the Cpp1 standard library.

4. More Examples

By declaring type aliases to have familiar names:

ul: type == ulong;
ull: type == ulonglong;
s8: type == std::u8string;

x: ull = 2 + 2ul + 2ull;
y: = (0, 0)point + (0, 0)point;
call((0, 0)point, "text"s8.size());

m: = my::http::download("http://somewhere/somefile.ext"url.encode());
// -- `min` and `s` are Cpp1 UDLs from `std::chrono`.
n: = (1min + 10s)my::clock;

The process of object constructions will be simpler and readable:

p1: player = ("Sam"id, 1year);
p2: player = (112id, 2year);
p3: player = ((114)id, 1year + 4month);
p4: player = (("Alex", 110)id, 3year + 3month);

((p2, p3)team, (p1, p4)team)battle.start();

5. Considered Alternatives

This suggestion is a simpler and generalized alternative way to both this issue and this issue, with a different approach. This suggestion completely unifies literal suffixes with constructors instead of integrating them with templates.

Edits

Haskell is a better language for syntax highlighting my Cpp2 examples! :sweat_smile:

msadeqhe commented 1 year ago

6. Similarity and comparison

`...TYPE` vs Cpp1-style `TYPE(...)`

Cpp1 cannot have ...TYPE syntax, because of literal prefixes and suffixes (compatibility with C).

A(...) in Cpp1 can be a type, a function, an object or a macro, whereas in this suggestion ...A or (...)A in Cpp2 are always types (context-free), and they always call the constructor.
Parentheses in Cpp1-style TYPE(...) are mandatory, whereas they are optional in (...)TYPE for a literal. So ...TYPE is both UDL and constructor.

`...TYPE` and `(: TYPE = ...)`

They are the same. ...TYPE is a syntactic sugar to (: TYPE = ...), in a similar manner that OBJ.FUNC(...) is a syntactic sugar to FUNC(OBJ, ...) in UFCS. ...TYPE increases code readability in addition to comfortability of writing code.

For object construction, ...TYPE is syntactic sugar to (: TYPE = ...).
- For UDLs and constructors
For function call, OBJ.FUNC(...) is syntactic sugar to FUNC(OBJ, ...).
- For member functions and non-member functions

`...TYPE` and control structures

Initializing is supported for all control structures in Cpp2:

(copy i: = 0) while i < 10 next i++ {
    /*{- statements... -}*/
}

The parentheses before while is like declaring parameters for it. But the parentheses before TYPE is like passing arguments to it. So the parentheses before a KEYWORD (e.g. while, if, for, ...) are for parameterized block statements, and the parentheses before a TYPE are for passing arguments to the constructor:

// -- Pass arguments to `TYPE`s constructor.
(1, 2)TYPE

// -- Declare parameters for `while`, `if`, `for`, ...
(copy i: = 0) KEYWORD...

`...TYPE` and postfix operators

Only a literal (without postfix operators) or parentheses may be immediately before TYPE:

a0: = 10ull++;   // -- OK.
b0: = 10++ull;   // -- ERROR!
a1: = (10ull)++; // -- OK.
b1: = (10++)ull; // -- OK.

`...TYPE` and prefix operators

Constructor calls have higher precedence than prefix operators:

x: = -10ull; // -- It's equal to -(10ull)
y: = !"text"something; // -- It's equal to !("text"something)

`...TYPE` and immediately call `operator()` and `operator[]`

operator() and operator[] are postfix operators, they are called after object construction:

x: = (1, 2)TYPE(); // -- It's equal to ((1, 2)TYPE)();
y: = (1, 2)TYPE[0]; // -- It's equal to ((1, 2)TYPE)[0];

`...TYPE` after `operator()` and `operator[]` and variable templates

These are corner cases. They can be banned, although they are syntactically correct (left to right):

x: = object()TYPE; // -- It's equal to (object())TYPE
y: = object[0]TYPE; // -- It's equal to (object[0])TYPE
z: = pi<ulong>TYPE; // -- It's equal to (pi<ulong>)TYPE

I think the decision is related to Cpp2's goals. By the way, it's safe not to support these corner cases.

AbhinavK00 commented 1 year ago

Been trying to give some feedback for some days but idk what to say. This suggestion builds up on Herb's {constructor × assignment} unification by making suffixes as contructor which, when you think about it makes a lot of sense. I have one question, how does this play out with something like std::string's literal?

using namespace std::literals;
//Cpp example 
auto str1 = "hi y'all"s; 
auto str2 = std::string{"hi again"}; 
//both work

using namespace std::literals;
//cpp2 example
str1 := "hi a third time"std::string;
str2 := "last hi"s; //would this work?

Other than that, I think this suggestion is great (I would like anything that prevents me from writing the type between : and =)

But I would also like to see how issue #451 is solved, maybe Herb could come up with something combined with this that also keeps the operator= as a binary operator.

msadeqhe commented 1 year ago

Thanks for your feedback. Yes, that Cpp2 example would work. Herb stated in this comment, he want to support consuming UDLs, but he didn't decide on whether or not to support authoring UDLs yet.

UDLs and Types

It's possible to have UDLs with the same name of types. In this case, types will be prefered over UDLs. For example:

// abc: type;

// UDL in Cpp1
abc operator ""abc(const char *str, std::size_t len) {
    return (: std::string = (str, len));
}

// Type declaration in Cpp2
abc: type = {
    operator=: (out this, value: std::string) = {}
}

main: () = {
    // It won't call UDL function.
    // It would call the type's constructor.
    object: = "text"abc;
}

On the other hand, can UDLs be used in place of types? Two options may be considered:

UDLs can be used in place of types too (but of course, types will be prefered over UDLs if they have the same name). The return type of UDL functions will be used to treat them as types. For example:
```
// UDL in Cpp1
unsigned long long operator ""ull(unsigned long long value) {
return value;
}
```

main: () = { // It would call UDL function: operator ""ull(1'000) object: ull = 1'000; }

2. UDLs cannot be used in place of types.

Option 1 is generalized for object construction, similar to how UFCS works on functions. Option 1 would make UDLs to behave like they are non-member constructors, IMO it's better than option 2.

### Declaration syntax in Cpp2;
### UDLs are Non-member Constructors.

But if the plan is to support authoring them, if we look at how semantically they are related to types' constructors, the following syntax seems reasonable for them, especially if the plan is to allow UDLs to be used in place of types:

```cpp
// in Cpp1:
// RETURNTYPE operator ""SUFFIX(ARGTYPE ARG) {...}
SUFFIX: (ARG: ARGTYPE) -> type == RETURNTYPE = {
    // -- statements...
}

That means, they are functions in which their return type is a type alias. For example:

// in Cpp1:
// unsigned long long operator ""ull(unsigned long long value) {...}
ull: (value: ulonglong) -> type == ulonglong = {
    return value;
}

main: () = {
    x: = 1'000ull;
}

In this case, UDLs in Cpp1 are changed to mean Non-member Constructors in Cpp2.

SebastianTroy commented 1 year ago

Shouldn't your last bit of cpp2 code

main: () = { x: = 1'000ull; }

Have a function call, rather than a UDL?

main: () = { x: = 1'000.ull(); }

On 18 May 2023 08:29:49 Sadeq @.***> wrote:

Thanks for your feedback. Yes, that Cpp2 example would work. Herb stated in [this comment][1], he want to support consuming UDLs, but he didn't decide on whether or not to support authoring UDLs yet.

UDLs

It's possible to have UDLs with the same name of types. In this case, types will be prefered over UDLs. For example:

// abc: type;

// UDL in Cpp1 abc operator ""abc(const char *str, std::size_t len) { return (: std::string = (str, len)); }

// Type declaration in Cpp2 abc: type = { operator=: (out this, value: std::string) = {} }

main: () = { // It won't call UDL function. // It would call the type's constructor. object: = "text"abc; }

On the other hand, can UDLs be used as types? Two approaches may be considered:

UDLs can be used in place of types too (but of course, types will be prefered over UDLs if they have the same name). The return type of UDL functions will be used to treat them as types. For example:

// UDL in Cpp1 unsigned long long operator ""ull(unsigned long long value) { return value; }

main: () = { // It would call UDL function. object: ull = 1'000; }

UDLs cannot be used as types. If the plan is not to support authoring UDLs in Cpp2, a simple type alias is a good alternative:

// Type alias in Cpp2 str: == std::string;

object = "text"str;

UDLs are Non-member Constructors (declaration syntax in Cpp2)

But if the plan is to support authoring them, if we look at how semantically they are related to types' constructors, the following syntax seems reasonable for them, especially if the plan is to allow UDLs to be used in place of types:

// in Cpp1: // RETURNTYPE operator ""SUFFIX(ARGTYPE ARG) {...} SUFFIX: (ARG: ARGTYPE) -> type == RETURNTYPE = { // -- statements... }

That means, they are functions in which their return type is a type alias. For example:

// in Cpp1: // unsigned long long operator ""ull(unsigned long long value) {...} ull: (value: ulonglong) -> type == ulonglong = { return value; }

main: () = { x: = 1'000ull; }

In this case, UDLs in Cpp1 are changed to mean Non-member Constructors in Cpp2.

— Reply to this email directly, view it on GitHubhttps://github.com/hsutter/cppfront/issues/455#issuecomment-1552639875, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AALUZQP4VHME7YBSDE3OXO3XGXFWVANCNFSM6AAAAAAYAQFN4M. You are receiving this because you are subscribed to this thread.Message ID: @.***>

msadeqhe commented 1 year ago

Shouldn't your last bit of cpp2 code
main: () = { x: = 1'000ull; }
Have a function call, rather than a UDL?
main: () = { x: = 1'000.ull(); }

It doesn't need parenthesis, because the idea is to make ...TYPE or ...SUFFIX to call TYPE's constructor or SUFFIX UDL function respectively.

I have to explain that if the constructor requires multiple arguments, it would be called like (arg1, arg2, ...)TYPE.

On the other hand, UDL functions cannot have multiple parameters, therefore it won't be called like (arg1, arg2, ...)SUFFIX, because SUFFIX would only work on a single argument, so (10)suffix and 10suffix are correct.

msadeqhe commented 1 year ago

On the other hand, UDL functions cannot have multiple parameters, therefore it won't be (arg1, arg2, ...)SUFFIX, because SUFFIX would only work on a single literal without parenthesis, so (10)suffix is wrong and 10suffix is correct.

I've corrected that to this:

On the other hand, UDL functions cannot have multiple parameters, therefore it won't be (arg1, arg2, ...)SUFFIX, because SUFFIX would only work on a single argument, so (10)suffix and 10suffix are correct.

Because optional parenthesis and allowing UDLs to work on expressions, will help without any conflict or ambiguity.

SebastianTroy commented 1 year ago

Ah, apologies, I believe thanks to UFCS the

.ull()

Already works in cpp2, so why add another way of doing the same thing?

On 18 May 2023 08:58:34 Sadeq @.***> wrote:

Shouldn't your last bit of cpp2 code

main: () = { x: = 1'000ull; }

Have a function call, rather than a UDL?

main: () = { x: = 1'000.ull(); }

It doesn't need parenthesis, because the idea is to make ...TYPE or ...SUFFIX to call TYPE's constructor or SUFFIX UDL function respectively.

I have to explain that if the constructor requires multiple arguments, it would be called like (arg1, arg2, ...)TYPE.

On the other hand, UDL functions cannot have multiple parameters, therefore it won't be (arg1, arg2, ...)SUFFIX, because SUFFIX would only work on a single literal without parenthesis, so (10)suffix is wrong and 10suffix is correct.

— Reply to this email directly, view it on GitHubhttps://github.com/hsutter/cppfront/issues/455#issuecomment-1552685706, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AALUZQOPCEBFZKSAUMA2MTDXGXJCFANCNFSM6AAAAAAYAQFN4M. You are receiving this because you commented.Message ID: @.***>

msadeqhe commented 1 year ago

I have to explain that if the constructor requires multiple arguments, it would be called like (arg1, arg2, ...)TYPE.

On the other hand, UDL functions cannot have multiple parameters, therefore it won't be called like (arg1, arg2, ...)SUFFIX, because SUFFIX would only work on a single argument, so (10)suffix and 10suffix are correct.

If Herb accepts to support authoring UDLs like they are non-member constructors, in this case also SUFFIX may have multiple parameters, and it can be called like (arg1, arg2, ...)SUFFIX.

msadeqhe commented 1 year ago

Ah, apologies, I believe thanks to UFCS the .ull() Already works in cpp2, so why add another way of doing the same thing?

Because:

It's surprising to me, because:

UFCS is about Unifying Function Call Syntax, and suddenly it works with types.

It doesn't feel expressive enough for a context-free language. : Type = (args) creates a variable, but A(args) (also a.A(args)) may create a variable or may call a function (or function object).

It's like accessing a base class within multiple inheritance, e.g. a.Base::call() in Cpp1.

Also it would conflict with multiple inheritance in Cpp2, it depends on how we would access base types:
Base1: type = {
    operator=: (out this) = {}
    operator=: (out this, v: x) = {}
    operator(): (this) -> int = 0;
}

Base2: type = {
    operator=: (out this) = {}
    operator=: (out this, v: x) = {}
    operator(): (this) -> int = 0;
}

x: type = {
    this: Base1 = ();
    this: Base2 = ();
    variable: Base1 = ();

    operator(): (this) -> int = {
        // It calls operator().
        m: = this.variable();

        // Does it call operator() from Base1?
        // or calls the constructor with `Base1(this)`?
        // It's ambiguous because of UFCS on types.
        n: = this.Base1();

        return 0;
    }
}
In example above, this::Base1() can be another syntax option, but that resembles scope resolution operator (e.g. namespace::... or type::...) which doesn't look uniform to how we access members of this.

It would complicate the language, similar to how Type(...) has complicated Cpp1 for object construction and function declaration in Most Vexing Parse. It's better to distinguish types from functions and variables syntactically in addition to semantically.

I couldn't find main reasons that why it's surprising to me. Now I've found them:

UFCS is syntactically and semantically incorrect for types.

UFCS is about to unify function(a, args) (non-member functions) with a.function(args) (member functions).

It's important to note that both of them are valid syntax for functions without UFCS.

On the other hand, Type(a, args) is unified with a.Type(args).

But the problem with a.Type(args) is that itself is not a valid syntax without UFCS!

It must be A::Type(args) to be a valid nested type, because nested types always need scope resolution operator.

So UFCS on types would unify Type(a) (object construction) with an invalid syntax a.Type() (nested type which has to be A::Type()). That's the reason why I think UFCS on types are incorrect.

It's inconsistent with nested types, thus what's the point of UFCS on types? For example:
A: type = {
    X: type = {}
}

B: type = {
    operator=: (out this, a: A) = {}
}

main: () = {
    a: A = ();

    // It works.
    // It's equal to `B(a)`.
    m: = a.B();
    // a.B() == B(a)

    // ERROR! It doesn't work.
    // It must be `A::X()`.
    n: = a.X();
    // a.X() != A::X(a)
}
So a.Type(arg) would lead to surprises on types, because it doesn't work on nested types.

UFCS on types is in contrast to the purpose of operator. which is to access members!

UFCS on types would make member functions to conflict with a.SOMETHING(args).

Member functions and types are completely different, but unwillingly they will impact each other. It's is in contrast with UFCS for functions in which it only impacts on what function to call.
abc: type = {
    klass: (this) = {}
}

klass: type = {
    operator=: (out this, v: abc) = {}
}

main: () = {
    a: abc = ();

    // It conflicts...
    // Does it call the constructor of `klass`?
    // or it calls the member function `klass`?
    a.klass();
}
In this example, the meaning of a.klass() would be ambiguous.

For any type named klass, semantically a.klass() is inconsistent with member access.

In contrast, o.func() and func(o) for functions are semantically consistent with member access, the first argument is the object.

But a.klass() and klass(a) for types are semantically inconsistent with member access, the first argument is not the object, it's just an argument which shouldn't be used like an object. Types don't have enough relation to UFCS.
a.Type() --> operator=(out this, a)
a.func() --> func(a) // `this = a` for member functions

Those reasons are from this issue.

SebastianTroy commented 1 year ago

Yes, user defined literals are a type of function call, just one with a weird syntax (what does operator"" have to do with numerical literals anyway?!) and unique rules that need to be taught, UFCS seems ideal for this IMO.

literal.function() is the same as function(literal), as per UFCS, the same as everywhere, I don't see any room for ambiguity, unless there is a function and a type with the same name and signature, but then I'm not sure if that is valid anyway?

Your inheritance example, why do your base classes both have constructors requiring an instance of the child type? Is this valid code? Also this example doesn't contain literals so I'm not sure how it is relevant?

Where you say

a.A(args)) may create a variable or may call a function (or function object).

A is not a literal in this example, and A(args) doesn't seem to be a UDL either, but for completeness

1.MyIntegerType() 1.funcReturningMyIntegerType()

Being replaced by

MyIntegerType(1) funcReturningMyIntegerType(1)

Seems fine, and in both cases this is really very equivalent. In both cases an instance is created, and in both cases a function call occurs, it just so happens that one of those calls is a constructor function.

Are you perhaps trying to report a bug with UFCS with relation to classes and multiple inheritance?

Or perhaps I need to log into GitHub and stop doing all this via email... Apologies if I'm missing some greater context here

On 18 May 2023 09:16:46 Sadeq @.***> wrote:

Ah, apologies, I believe thanks to UFCS the .ull() Already works in cpp2, so why add another way of doing the same thing?

Because:

It's surprising to me, because:

UFCS is about Unifying Function Call Syntax, and suddenly it works with types.
It doesn't feel expressive enough for a context-free language. : Type = (args) creates a variable, but A(args) (also a.A(args)) may create a variable or may call a function (or function object).
It's like accessing a base class within multiple inheritance, e.g. a.Base::call() in Cpp1.

Also it would conflict with multiple inheritance in Cpp2, it depends on how we would access base types:

Base1: type = { operator=: (out this) = {} operator=: (out this, v: x) = {} operator(): (this) -> int = 0; }

Base2: type = { operator=: (out this) = {} operator=: (out this, v: x) = {} operator(): (this) -> int = 0; }

x: type = { this: Base1 = (); this: Base2 = (); variable: Base1 = ();

operator(): (this) -> int = {
    // It calls operator().
    m: = this.variable();

    // Does it call operator() from Base1?
    // or calls the constructor with `Base1(this)`?
    // It's ambiguous because of UFCS on types.
    n: = this.Base1();

    return 0;
}

}

In example above, this::Base1() can be another syntax option, but that resembles scope resolution operator (e.g. namespace::... or type::...) which doesn't look uniform to how we access members of this.

It would complicate the language, similar to how Type(...) has complicated Cpp1 for object construction and function declaration in Most Vexing Parse. It's better to distinguish types from functions and variables syntactically in addition to semantically.

I couldn't find main reasons that why it's surprising to me. Now I've found them:

UFCS is syntactically and semantically incorrect for types.
- UFCS is about to unify function(a, args) (non-member functions) with a.function(args) (member functions).
  - It's important to note that both of them are valid syntax for functions without UFCS.
- On the other hand, Type(a, args) is unified with a.Type(args).
  - But the problem with a.Type(args) is that itself is not a valid syntax without UFCS!
  - It must be A::Type(args) to be a valid nested type, because nested types always need scope resolution operator.

So UFCS on types would unify Type(a) (object construction) with an invalid syntax a.Type() (nested type which has to be A::Type()). That's the reason why I think UFCS on types are incorrect.

It's inconsistent with nested types, thus what's the point of UFCS on types? For example:

A: type = { X: type = {} }

B: type = { operator=: (out this, a: A) = {} }

main: () = { a: A = ();

// It works.
// It's equal to `B(a)`.
m: = a.B();
// a.B() == B(a)

// ERROR! It doesn't work.
// It must be `A::X()`.
n: = a.X();
// a.X() != A::X(a)

}

So a.Type(arg) would lead to surprises on types, because it doesn't work on nested types.

UFCS on types is in contrast to the purpose of operator. which is to access members!

UFCS on types would make member functions to conflict with a.SOMETHING(args).

Member functions and types are completely different, but unwillingly they will impact each other. It's is in contrast with UFCS for functions in which it only impacts on what function to call.

abc: type = { klass: (this) = {} }

klass: type = { operator=: (out this, v: abc) = {} }

main: () = { a: abc = ();

// It conflicts...
// Does it call the constructor of `klass`?
// or it calls the member function `klass`?
a.klass();

}

In this example, the meaning of a.klass() would be ambiguous.

For any type named klass, semantically a.klass() is inconsistent with member access.

In contrast, o.func() and func(o) for functions are semantically consistent with member access, the first argument is the object.

But a.klass() and klass(a) for types are semantically inconsistent with member access, the first argument is not the object, it's just an argument which shouldn't be used like an object. Types don't have enough relation to UFCS.

a.Type() --> operator=(out this, a) a.func() --> func(a) // this = a for member functions

Those reasons are from this issuehttps://github.com/hsutter/cppfront/issues/284.

— Reply to this email directly, view it on GitHubhttps://github.com/hsutter/cppfront/issues/455#issuecomment-1552706931, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AALUZQJ6CCFFRHFSLJPK4S3XGXLGNANCNFSM6AAAAAAYAQFN4M. You are receiving this because you commented.Message ID: @.***>

msadeqhe commented 1 year ago

Yes, user defined literals are a type of function call, just one with a weird syntax (what does operator"" have to do with numerical literals anyway?!) and unique rules that need to be taught, UFCS seems ideal for this IMO.

I'm agree with you except the part about UFCS. Instead of using UFCS to replace UDLs, let's fix that problems. Cpp2 can have a different syntax for declaring UDLs. The following is just an example (its syntax can be anything else):

suffix: (value: ulong) -> type == SomeType = {
    // statements...
}

I should mention I don't suggest to support authoring UDLs in Cpp2 (it's just a possibility to consider).

I suggest to change the syntax of object construction from TYPE(...) to (...)TYPE, therefore UDLs would be completely replaced with constructors.

literal.function() is the same as function(literal), as per UFCS, the same as everywhere, I don't see any room for ambiguity, ...

Yes, that's the problem. It works, but it doesn't worth it. In a nutshell, the problems with a.Type(args) are that:

Syntactically it's inconsistent with member access operator (aka operator dot), because a doesn't have member Type.
Syntactically it's inconsistent with scope resolution operator (aka ::) for referring to types.
Syntactically it's wrong, because a is not the first argument of Type's constructor in operator=: (out this, args).
- UFCS on functions: a is the first argument (this argument).
- UFCS on types: a is not the first argument (this argument) of constructor function.
Syntactically it's not context-free.
- The compiler (not transpiler) and the programmer must look up for Type declaration to see if that's a type or a callable.
Semantically it's inconsistent with UFCS, because they do completely different things, in this way:
- UFCS on functions: a is the object to work with it.
- UFCS on types: a is not the object to work with it, a and args are arguments to construct a new object.
Semantically it's meaningless, because a has always exactly the same behaviour as args.
- That's a useless visual separation.

msadeqhe commented 1 year ago

Now, let's consider these examples of how a.Type(args) may go wrong:

Connection: type = {
    operator=: (out this, timeout: uint) = {}
    operator=: (out this, timeout: uint, proxy: my::proxy) = {}
    operator=: (out this, encrypted: bool, timeout: uint, proxy: my::proxy) = {}
}

main: () = {
    x: = 2000.Connection();

    // Are they related to UFCS and UDLs? No.
    y: = 2000.Connection(my::proxy());
    z: = true.Connection(2000, my::proxy());
}

IMO that code is unreadable.

The problem with a.Type(arg) (UFCS on types) is that a is not the object (this argument), it's just like other args in which the interface is not prepared for it, it leads to unreadable code. But in a.func(args) (UFCS on functions), a is the object (this argument), it leads to readable code.

SebastianTroy commented 1 year ago

Thanks, this is really clear now I understand.

UDL has historically required a function definition, which still works in cpp2 via UFCS, however you want to succinctly specify what the type is of the literal, without the need to create a function, and then call it.

In cpp1 you can use size_t{1} with the added benefit that the literal is bounds checked.

In cpp2 the following is difficult

foo : uint64 = ~0; bar : uint64 = 0xffffffffffffffff;

Are these equivalent values? Or is ~0 an int32?

UFCS way (requires an extra function definition) ull := (in x : uint64) { return x; } foo : uint64 = ~0.ull();

Your way (presumably you're expecting the constructor to take precedence over the ~operstor) foo := ~0uint64

Built in types don't have a constructor so does this work in cppfront? foo := ~uint64(0)

I see where you're coming from, however it does add to the concept count of the language.

How does your proposition handle the difference between built in and user defined types?

On 18 May 2023 12:20:53 Sadeq @.***> wrote:

suffix: (value: ulong) -> type == SomeType = { // statements... }

I should mention I don't suggest to support authoring UDLs in Cpp2 (it's just a possibility to consider). I suggest to change the syntax of object construction from TYPE(...) to (...)TYPE, therefore UDLs would be completely replaced with types.

literal.function() is the same as function(literal), as per UFCS, the same as everywhere, I don't see any room for ambiguity, ...

Yes, that's the problem. It works, but it doesn't worth it. In a nutshell, the problems with a.Type(args) are that:

Syntactically it's inconsistent with member access operator (aka operator dot), because a doesn't have member Type.
Syntactically it's inconsistent with scope resolution operator (aka ::) for referring to types.
Syntactically it's wrong, because a is not the first argument of Type's constructor in operator=: (out this, args).
Syntactically it's not context-free.
- The compiler (not transpiler) and the programmer must look up for Type declaration to see if that's a type or a callable.
Semantically it's inconsistent with UFCS, because they do completely different things, in this way:
- UFCS on functions: a is the object to work with it.
- UFCS on types: a is not the object to work with it, a and args are arguments to construct a new object.
Semantically it's meaningless, because a is always exactly the same as args, that's a useless visual separation.

— Reply to this email directly, view it on GitHubhttps://github.com/hsutter/cppfront/issues/455#issuecomment-1552911601, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AALUZQJ7FY777GXAW7A4JQDXGYAY3ANCNFSM6AAAAAAYAQFN4M. You are receiving this because you commented.Message ID: @.***>

SebastianTroy commented 1 year ago

I agree, but again, there is no implicit this in cppfront, so I don't think literal calls via UFCS can call type members anyway

On 18 May 2023 12:57:47 Sadeq @.***> wrote:

Now, let's consider these examples of how a.Type(args) may go wrong:

Connection: type = { operator=: (out this, timeout: uint) = {} operator=: (out this, timeout: uint, proxy: my::proxy) = {} operator=: (out this, encrypted: bool, timeout: uint, proxy: my::proxy) = {} }

main: () = { // No! It's not 2000 connections! It's Connection(2000). x: = 2000.Connection();

// Are they related to UFCS and UDLs? No.
y: = 2000.Connection(my::proxy());
z: = true.Connection(2000, my::proxy());

}

IMO that code is unreadable.

— Reply to this email directly, view it on GitHubhttps://github.com/hsutter/cppfront/issues/455#issuecomment-1552948073, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AALUZQMOV7NCMN5SGGJBL6TXGYFDTANCNFSM6AAAAAAYAQFN4M. You are receiving this because you commented.Message ID: @.***>

msadeqhe commented 1 year ago

Built in types don't have a constructor so does this work in cppfront? foo := ~uint64(0)

Yes, that works with foo: = 0uint64~ (bitwise ~ is postfix in Cpp2). Built-in types don't need constructors because ...TYPE is a syntactic sugar to (:TYPE=value).

I see where you're coming from, however it does add to the concept count of the language.

It changes the syntax of one concept from TYPE(args) to (args)TYPE. Also it would completely eliminates the concept of built-in literal prefixes and suffixes and UDLs. Although I hope Cpp2 to support authoring UDLs as they are likely non-member constructors.

How does your proposition handle the difference between built in and user defined types?

They wouldn't be a separate concept. Infact there wouldn't be any built-in literal prefixes and suffixes. All of them would be UDLs for built-in types (if needed). They can be used with types and type aliases together in a similar way:

ull: type == ulonglong;

something: type = {
    operator=: (out this, value: ull) = {}
}

// -- UDL (aka Non-member Constructor) declaration example in Cpp2
suffix: (value: ull) -> type == something = (value)something;

main: () = {
    // -- All of them are valid.
    a: = (10)int;
    b: = 10int;
    c: = 10ull;
    d: = 10something;
    e: = (10ull)something;
    f: = (10ull)suffix;
    g: = (10)suffix;
    h: = 10suffix;
}

I agree, but again, there is no implicit this in cppfront, so I don't think literal calls via UFCS can call type members anyway

If I understand your response correctly, they would be called as reported in this comment by @JohelEGP, because when Cpp2 transforms a.Type(args) to Type(a, args), if literal (or variable) a is int and the parameter type is int, implicit conversion won't be happened, because that's a direct call.

msadeqhe commented 1 year ago

Semantically it's meaningless, because a has always exactly the same behaviour as args.

That's a useless visual separation.

I have to clarify that with (...)TYPE, that useless visual separation doesn't exist for multiple arguments. For example:

// This separation between `true` and other arguments are meaningless.
// Using TYPE(...) and UFCS on types for object construction:
x0: = true.Connection(2000, my::proxy());

// All arguments have the same behaviour on object construction.
// Using (...)TYPE for object construction:
x1: = (true, 2000, ()my::proxy)Connection;

In x0 That useless visual separation is misleading for object construction.

But in x1 all arguments are truly visually with together.

JohelEGP commented 1 year ago

That's not a problem exclusive to types. Not all functions have a nice flow using UFCS.

x0 := true.connect(2000, my::proxy());

msadeqhe commented 1 year ago

That's not a problem exclusive to types. Not all functions have a nice flow using UFCS.
x0 := true.connect(2000, my::proxy());

That's a problem exclusive to types, because the first argument of member functions is already an object this of a type. In your example, that doesn't work for explicit this, because this parameter of member functions cannot be of type bool.

JohelEGP commented 1 year ago

You're mixing up two things. My comment was about the general flow of UFCS.

As for the out this parameter of an operator=, the result object is implied by the call-site syntax to create an object. So there's no argument for the explicit this parameter.

msadeqhe commented 1 year ago

Maybe I didn't understand your comment correctly, but that's what I'm trying to explain that a.Type(args) is a bad mix of UFCS and object construction.

I don't have a problem with how a.Type(args) works, my problem is that why it works!

Consider I want to explain about UFCS on types for novice programmers in the following paragraphs.

The first parameter of non-member functions, is not inherently wrong to be used as this object:

function: (value: int, arg: int) = {}

x: = 10.function(10);

The first parameter of member functions, is exactly this object:

Something: type = {
    function: (this, arg: int) = {}
}

a: Something = ();
x: = a.function(10);

But the first parameter of constructors, is out this and we can't call it with this object. That's right but why the second parameter of constructors can be used as this object in UFCS? While the first parameter is expressively written syntactically to be this object!

Something: type = {
    operator=: (out this, arg1: int, arg2: int) = {}
}

x: = 10.Something(10);

OK. This example is one of the reasons I think UFCS on types are not natural. You get visually similar syntax for both UFCS on functions a.func(args) and on types a.Type(args), but it leads to inconsistent syntax and semantic for object constructions, nested types and etc which I explained before.

JohelEGP commented 1 year ago

That could be convincing. Let's try to look at the type's name as the implicit first argument to stimulate the mind.

x: Type = (a, 0); // Arguments: (`Type`, `a`, `0`).
x := :Type = (a, 0); // Arguments: (`Type`, `a`, `0`).
x := Type(a, 0); // Arguments: (`Type`, `a`, `0`).
x := Type(a, 0); // Callable: `Type`, arguments: (`a`, `0`).
x := a.Type(0); // Arguments: (`Type`, `a`, `0`), out of order.
x := a.Type(0); // Callable: `Type`, arguments: (`a`, `0`).
x := a.func(0); // Callable: `func`, arguments: (`a`, `0`).

I make no conclusions so far.

AbhinavK00 commented 1 year ago

I think authoring UDLs shouldn't be a thing if this suggestion is implemented, could make thing confusing IMO. Other than that, I don't see a problem with this suggestion. Though it is kind of surprising when seen at first but makes sense when you see that it is generalised from suffixes OR the other way around, suffices could be generalised from this.

msadeqhe commented 1 year ago

@JohelEGP Good point. I change them to Cpp2 function signatures:

// `obj` is not `this`. It can't be the object.
x: Type = (obj, 0); //--> (out this, obj, 0)
x: =: Type = (obj, 0); //--> (out this, obj, 0)
x: = Type(obj, 0); //--> (out this, obj, 0)

// `obj` can be the object.
x: = func(obj, 0); //--> func(inout this = obj, 0)

// `obj` is not `this`. It can't be the object.
// Why is `obj` treated like the object?
// Whereas `out this` is the object.
x: = obj.Type(0); //--> (out this, obj, 0)

// `obj` can be the object.
x: = obj.func(0); //--> func(inout this = obj, 0)

So both inout this and in this are consistent with how UFCS works, but out this has different (inconsistent) behaviour in UFCS. Briefly, I think UFCS shouldn't work on operator= (constructors).

Type(obj, ...) is a function notation for Type to call its constructor, because of UFCS it must be equal to obj.Type(...) notation. But obj.Type(...) itself is not valid (inconsistent with operator dot, nested types and this parameter).

msadeqhe commented 1 year ago

@AbhinavK00 You're right. The syntax is like literal suffixes. They are expressive for object construction.

I'm agree that (...)TYPE is not similar to function calls, that's intentional, so UFCS won't work on them.

Authoring UDLs is only a possibility to consider. As you said, it would increase concept count. By the way, UDLs can be Non-member Constructors in my suggestion.

msadeqhe commented 1 year ago

I have to mention, (arg)Type for object construction is left to right as the same as a: Type for declaration is left to right. In this way, the type would always come after the identifier of the declaration, the literal, or the arguments of the constructor. But Type(arg) (Cpp1-style) doesn't follow this rule.

msadeqhe commented 1 year ago

...TYPE after operator() and operator[] and variable templates

These are corner cases. They can be banned, although they are syntactically correct (left to right):
x: = object()TYPE; // -- It's equal to (object())TYPE
y: = object[0]TYPE; // -- It's equal to (object[0])TYPE
z: = pi<ulong>TYPE; // -- It's equal to (pi<ulong>)TYPE
I think the decision is related to Cpp2's goals. By the way, it's safe not to support these corner cases.

I'm thinking about this use case... what if Cpp2 supports it too?

Consider we already have Function Chaining in C++:

x: = fetch("something").filter(10).sort(true);

With (...)Type chaining, we would have:

x: = fetch("something")list.filter(10)list.sort(true);

It would allow us to specify the types within Function Chaining. For example, it would be possible to have mytype and vector instead of list in that example:

x: = fetch("something")mytype.add(10)vector.size();

IMO that's useful. Optionally it's a possibility to consider for Cpp2.

msadeqhe commented 1 year ago

Also type composition (derived units) are possible with <>, but parentheses are mandatory except for literals:

A: type = { /*declarations*/ }
B: type = { /*declarations*/ }

two: = 2;

i: = 2A;
j: = 2<A*B>;
k: = (two)<A*B>;

m: = (1, 2)A;
n: = (1, 2)<A*B>;

x: A = (1, 2);
y: <A*B> = (1, 2);

// <T> is template parameter.
r: <T> A = (1, 2);
s: <T> <A*B> = (1, 2);

msadeqhe commented 1 year ago

I have to correct my suggestion about derived units e.g. <A*B> in previous comment:

The type of <A * B> is always A.
- For every arithmetic operator, the type of A op B is always A.
The type of <A && B> is always bool.
- For every logical operator, the type of A op B is always bool.
The type of <A += B> is always A&.
- For every assignment operator, the type of A op B is always A&.
But the type of the following operators depend on the signature of their functions:
- <A*> aka Indirection
- <A&> aka Address-of
- <A()> aka Function Call
- <A[]> aka Subscript

For arithmetic, logical and assignment operators, the type of expression is always known from themselves. So <> is not needed at all to get the type of them. Also <A*>, <A&>, <A()> and <A[]> operations are not useful within derived units.

msadeqhe commented 1 year ago

Comparison with other suggestions and Cpp1-style

TLDR; (args)Type is considered better than other alternatives.

vs `(args):Type`

The advantage of (args)Type over (args):Type is that it won't change the meaning of declaration syntax within expressions:

// `a: Type` is a declaration.
a: Type = 2;

// But `a:Type` is a typed expression here.
b: = a:Type;

// `(a)Type` always is an expression and creates an object.
c: = (a)Type;

That's important because maybe Cpp2 will support named arguments or designated initialization with the following syntax:

// named arguments
x: = call(name: string = "someone", age: int = 20);

// designated initialization
m: Type = (name: string = "someone", age: int = 20);

vs `Type(args)`

The advantage of (args)Type over Type(args) (Cpp1-style) is that:

It's context-free.
It's left-to-right. The type is after expression, arguments, identifier or etc.

vs `(arg).Type(other_args)`

The advantage of (args)Type over (arg).Type(other_args) (Cpp1-style UFCS) is that:

It's context-free.

All arguments are at the same side, because they have the same characteristic in object creation:

// This is misleading, although 1 and 2 have the same charactersistic.
u: = 1.point(2);

// OK. `v` is a `point` with value `(1, 2)`.
v: = (1, 2)point;

vs `(args).Type`

The advantage of (args)Type over (args).Type is that it's context-free.

vs `(: Type = (args))`

They complement each other. The advantage of (args)Type over (: Type = (args)) is that:

It's left-to-right. The type is after expression, arguments, identifier or etc.

It doesn't need extra parenthesis around itself:

// Extra parenthesis is needed here.
u: = (: point = (1, 2)) * (: point = (1, 2));

// no extra parenthesis
v: = (1, 2)point * (1, 2)point;

vs control structures

They are completely different, but parenthesis before a type is syntactically similar to parenthesis before a control structure:

// Parenthesis before a control structure, will initialize variables within them.
(copy i: int = 0) while i <= 10 next i++ {
    std::print(i);
}

// Parenthesis before a type, will initialize an object.
x: = (1, 2)point;

msadeqhe commented 1 year ago

Use Cases

Consider the syntax of object creation is (args)class and class is a type. These are use cases.

1. Literals

As described in the suggestion, every object construction with (args)class is like a literal suffix:

a: = (2)int;
b: = ("text")string;

Optionally the parentheses may be omitted for literals (e.g. 2 int and "text" string). On the other hand, you may force to always write parentheses as they would resemble function calls.

So it would eliminate the need of built-in literal suffixes and prefixes.

2. Object Construction within Operator Chaining

It would be possible to construct objects within operator chaining (member access operators, unary postfix operators, operator(), operator[] and etc):

level: type = { /*...*/ }
something: type = { /*...*/ }

x: = (2++)level++.member(true)something()++;

Also (arg).class(other_args) (UFCS on types) has this advantage too, but unfortunately it has some problems which is described before.

3. User-defined Operators

If (args_1)class(args_2) would immediately call operator() (with argument args_2) after object creation with (args_1)class, it would resemble operator class within operands (args_1) and (args_2).

add: type = {
    data: int;
    operator=: (out this, arg: int) = {
        data = arg;
    }
    operator(): (this, value: int) -> int = data + value;
}

a: = 2;
b: = 2;

// x == 4
x: = (a) add (b);

It looks like we have defined operator add.

4. User-defined Language Constructs

If Cpp2 would support object construction chaining with operator(), and if we could use Meta Functions to have user-defined control structures, it would be like this:

list: vector<int> = /*...*/;
(copy i: int = 0) @forr (i < 10) nextt (i++) within (list) call (item) { /*...*/ }

So literally nextt, within and call are types but they look like keywords within control structure @forr which is a Meta Function applied to a parameterized block statement. The use of Meta Function will allow us to calculate arguments (expressions) multiple times (such as i < 10), because their behaviour would be like macros in Cpp1 as they generate code.

EDIT: I will create a new issue (suggestion) for it ~~when reflections are ready for C++~~.

EDIT: The issue is created here.

Conclusion

(args)class seems to be a general language feature to replace other minor features.

Also (args).class and (args):class can be considered as alternative notations.

EDIT: I've to clarify (args).class is not context-free, and (args):class will change the meaning of variable declaration within expressions, therefore it perhaps will conflict with named function parameters which have explicit types (it depends on the syntax if Cpp2 will support it in the future):

call(name: = "Someone", age: int /*explicit type*/ = 30);

hsutter / cppfront

[SUGGESTION] Literal suffixes are constructors. #455

1. Preface

2. Suggestion Detail

3. Your Questions

4. More Examples

5. Considered Alternatives

Edits

6. Similarity and comparison

...TYPE vs Cpp1-style TYPE(...)

...TYPE and (: TYPE = ...)

...TYPE and control structures

...TYPE and postfix operators

...TYPE and prefix operators

...TYPE and immediately call operator() and operator[]

...TYPE after operator() and operator[] and variable templates

UDLs and Types

...TYPE after operator() and operator[] and variable templates

Comparison with other suggestions and Cpp1-style

vs (args):Type

vs Type(args)

vs (arg).Type(other_args)

vs (args).Type

vs (: Type = (args))

vs control structures

Use Cases

1. Literals

2. Object Construction within Operator Chaining

3. User-defined Operators

4. User-defined Language Constructs

Conclusion

`...TYPE` vs Cpp1-style `TYPE(...)`

`...TYPE` and `(: TYPE = ...)`

`...TYPE` and control structures

`...TYPE` and postfix operators

`...TYPE` and prefix operators

`...TYPE` and immediately call `operator()` and `operator[]`

`...TYPE` after `operator()` and `operator[]` and variable templates

`...TYPE` after `operator()` and `operator[]` and variable templates

vs `(args):Type`

vs `Type(args)`

vs `(arg).Type(other_args)`

vs `(args).Type`

vs `(: Type = (args))`