hsutter / cppfront

A personal experimental C++ Syntax 2 -> Syntax 1 compiler
Other
5.51k stars 245 forks source link

[BUG] move from last use break code where variable is passed to function with `inout` argument passing #231

Open filipsajdak opened 1 year ago

filipsajdak commented 1 year ago

In the current implementation of cppfront, the following code:

f2: (inout x) -> _ = {
    x *= 2;
    return x;
}

main: () -> int = {
    x := 21;
    std::cout << f2(x) << std::endl;
}

Passes cppfront:

tests/bug_inout_argument.cpp2... ok (all Cpp2, passes safety checks)

But failed to compile on the cpp1 side with an error:

tests/bug_inout_argument.cpp2:7:18: error: no matching function for call to 'f2'
    std::cout << f2(std::move(x)) << std::endl;
                 ^~
tests/bug_inout_argument.cpp2:1:20: note: candidate function [with x:auto = int] not viable: expects an lvalue for 1st argument
[[nodiscard]] auto f2(auto& x) -> auto{
                   ^
1 error generated.

When cppfront moves x on its last use it breaks the requirements of the f2 function that requires an lvalue reference but gets an rvalue reference.

Expectations

cppfront should take into consideration the context in which the variable is used to avoid breaking the code on the cpp1 side.

gregmarr commented 1 year ago

Not really "initialize". It can be already initialized. You can pass initialized arguments to out parameters.

But you shouldn't, and the compiler should warn you if you initialize something and then pass it as an out parameter without reading from it first. The function has no way of knowing whether or not it is initialized, so it has to assume that it isn't and initialize it.

JohelEGP commented 1 year ago

The function has no way of knowing whether or not it is initialized, so it has to assume that it isn't and initialize it.

Maybe I've peeked too much into the generated code and cpp::out, which is used to implement out parameter passing. But it makes sense, considering that Herb has explained it (not here, perhaps in the CppCon talk) as "initializing it if it's not already initialized, and assigning to the already initialized object otherwise". So the implementation knows.

if you initialize something and then pass it as an out parameter without reading from it first.

After you initialize it, there's no difference if you pass it to an out parameter before or after reading from it.

gregmarr commented 1 year ago

So the implementation knows.

My understanding is that the implementation knows so that it can warn you about it, either at compile time or runtime, but I don't think that there's any concept of the user code itself being able to access that information. Written another way, the cpp1 generated code knows that information, but the cpp2 code doesn't, because that would change the type of the parameter, and you'd have to deal with the wrapped type, which could block things that you want to do with the original type.

After you initialize it, there's no difference if you pass it to an out parameter before or after reading from it.

There is a difference in that you initialized it unnecessarily because the function you passed it to is going to initialize it. This warning would be about the unnecessary initialization prior to calling the function. This might not actually happen, and might not be on by default, but it's definitely something that COULD be done. I'd have to watch 2022 again to see.

JohelEGP commented 1 year ago

My understanding is that the implementation knows so that it can warn you about it,

It knows to determine whether to construct or assign.

but I don't think that there's any concept of the user code itself being able to access that information.

That's right.

Written another way, the cpp1 generated code knows that information, but the cpp2 code doesn't, because that would change the type of the parameter, and you'd have to deal with the wrapped type

The generated code is such that the meaning doesn't change. You still deal with the type you declared. The generated code takes care of the unwrapping.

AbhinavK00 commented 1 year ago

It knows to determine whether to construct or assign.

In cpp2, construction and assignment are unified, so there's no distinction. I don't think the compiler knows how to determine whether to contruct or assign.

out: object should not be initialized before the call

Objects CAN be unitialised before the call.

I'm doubting the usefulness of a writeonly parameter. There are some examples yes but not enough I'd say

JohelEGP commented 1 year ago

In cpp2, construction and assignment are unified, so there's no distinction. I don't think the compiler knows how to determine whether to contruct or assign.

out T parameters outside operator= use cpp2::out<T> in C++1, which looks like an std::optional<T>, so it knows. What is generated from operator= doesn't use that, as constructors always construct, and assignments assign.

filipsajdak commented 1 year ago

I need to explain what I have presented above in more detail.

In cpp1, we have const methods - whenever we pass a const object, we can only call const methods that cannot change the object. We also have a mutable keyword that can somehow overcome that.

Based on that, I thought, what if we can mark what in things we can do in out context (this is precisely the reverse thing to the mutable where you mark what out actions you can make in in context - of course in cpp1 not in cpp2).

Let's consider the following type:

safe_store: type = {
  data: int;
  m: std::mutex;

  operator=:(out this) = {
    data = ();
    m = ();
  }

  write: (inout this, x : int) = {
    lock := std::lock_guard(m);
    data = x;
  }

  read: (inout this) -> int = {
    lock := std::lock_guard(m);
    return data;
  }
}

[For the sake of my following example, I have introduced the init passing style to distinguish write-only out from initialize-first out; the out will mean write-only].

I want to express something like the following:

safe_store: type = {
  data: int;
  SOMETHING m: std::mutex; // the SOMETHING marks m to be safe to read in `out` context and safe to modify in `in` context

  operator=:(init this) = { // note: currently there is no init passing style
    data = ();
    m = ();
  }

  write: (out this, x : int) = { // note: currently out this means something different
    lock := std::lock_guard(m); // currently fail as it needs to read and modify m
    data = x;
  }

  read: (in this) -> int = {
    lock := std::lock_guard(m); // currently fail as it needs to read and modify m
    return data;
  }
}

Using out like the above conflicts with the possibility that an object can be uninitialized - that is why I introduce the init passing style to distinguish uninitialized passing from the initialized write-only passing.

I wonder if there are more examples like that. I like the consistency, symmetry, and as few exceptions as possible. From my perspective, out is overloaded and needs special care to explain it and use it. It is similar to the cpp1 non-const lvalue reference - currently might mean inout or out.

I might need to be corrected. I was reading through 708, and I did not find why out is more about initialization than write-only. I have found that the current out is a symmetry to the current move - see paragraph "1.3.4 move parameters". And maybe this is my error as I assume that out (write) is the opposite of in (read), but it is more the opposite of move (uninitialize)... maybe it is all about names used.

If the above is true, I was confused by quote:

“Finally, the [internal January 1981] memo introduces writeonly: ‘There is the type operator writeonly, which is used like readonly, but prevents reading rather than writing...’ “ —B. Stroustrup (D&E, p. 90)

Where I assume that out is all about writeonly.

SebastianTroy commented 1 year ago

Your example type cannot have a const read method because you are modifying the member variable m.

It is my understanding that the in, out, inout etc tokens indicate the movement of side effects,

When a user calls read or write, the state of the instance affects the operation of the function call, the user can change what they are doing with the instance with the side effect affecting the operation inside the function call. The function call also has side effects that propogate out, i.e getting the lock prevents other threads from operating on the instance. Ergo the side effects for this are in and out.

On 31 March 2023 14:44:20 Filip Sajdak @.***> wrote:

I need to explain what I have presented above in more detail.

In cpp1, we have const methods - whenever we pass a const object, we can only call const methods that cannot change the object. We also have a mutable keyword that can somehow overcome that.

Based on that, I thought, what if we can mark what in things we can do in out context (this is precisely the reverse thing to the mutable where you mark what out actions you can make in in context - of course in cpp1 not in cpp2).

Let's consider the following type:

safe_store: type = { data: int; m: std::mutex;

operator=:(out this) = { data = (); m = (); }

write: (inout this, x : int) = { lock := std::lock_guard(m); data = x; }

read: (inout this) -> int = { lock := std::lock_guard(m); return x; } }

[For the sake of my following example, I have introduced the init passing style to distinguish write-only out from initialize-first out; the out will mean write-only].

I want to express something like the following:

safe_store: type = { data: int; SOMETHING m: std::mutex; // the SOMETHING marks m to be safe to read in out context and safe to modify in in context

operator=:(init this) = { // note: currently there is no init passing style data = (); m = (); }

write: (out this, x : int) = { // note: currently out this means something different lock := std::lock_guard(m); // currently fail as it needs to read and modify m data = x; }

read: (in this) -> int = { lock := std::lock_guard(m); // currently fail as it needs to read and modify m return x; } }

Using out like the above conflicts with the possibility that an object can be uninitialized - that is why I introduce the init passing style to distinguish uninitialized passing from the initialized write-only passing.

I wonder if there are more examples like that. I like the consistency, symmetry, and as few exceptions as possible. From my perspective, out is overloaded and needs special care to explain it and use it. It is similar to the cpp1 non-const lvalue reference - currently might mean inout or out.

I might need to be corrected. I was reading through 708https://github.com/hsutter/708/blob/main/708.pdf, and I did not find why out is more about initialization than write-only. I have found that the current out is a symmetry to the current move - see paragraph "1.3.4 move parameters". And maybe this is my error as I assume that out (write) is the opposite of in (read), but it is more the opposite of move (uninitialize)... maybe it is all about names used.

If the above is true, I was confused by quote:

“Finally, the [internal January 1981] memo introduces writeonly: ‘There is the type operator writeonly, which is used like readonly, but prevents reading rather than writing...’ “ —B. Stroustrup (D&E, p. 90)

Where I assume that out is all about writeonly.

— Reply to this email directly, view it on GitHubhttps://github.com/hsutter/cppfront/issues/231#issuecomment-1491948508, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AALUZQIFQMQJ4IDTZGOLDCDW63NS7ANCNFSM6AAAAAAT5JDQ2Y. You are receiving this because you were mentioned.Message ID: @.***>

JohelEGP commented 1 year ago

std::mutex has no const accessors, so writeonly doesn't help.

safe_store::read modifies the mutex and reads the data, so this should be inout.

safe_store::write modifies the mutex and modifies the data. But we don't know if the modification of the mutex would be writeonly. It doesn't exist in C++1, so it'd be a C++2 invention. That entails a new issue following the [SUGGESTION] template. I do remember having read about writeonly-like suggestions. But no further memories or emotions surge, which suggests me it wasn't that good. The fact that it doesn't scale when mixed with C++1, that we haven't agreed on its usefulness, and that it'd require cppfront to be more than just a transpiler, doesn't bode well for the suggestion.

filipsajdak commented 1 year ago

@SebastianTroy I have rework the example from cpp1:

class ThreadsafeCounter
{
    mutable std::mutex m; // The "M&M rule": mutable and mutex go together
    int data = 0;
public:
    int get() const
    {
        std::lock_guard<std::mutex> lk(m);
        return data;
    }

    void inc()
    {
        std::lock_guard<std::mutex> lk(m);
        ++data;
    }
};

See here: https://en.cppreference.com/w/cpp/language/cv

JohelEGP commented 1 year ago

It'd better to separate the suggestions of wanting mutable in C++2 and wanting to change the meaning of the parameter passing keywords.

AbhinavK00 commented 1 year ago

So if I understand correctly, Filip's suggestion is about a keyword that'd denote that the data member is immune to the parameter qualification, i.e. you can mutate it in in context and read it in out context. Not sure what will be the use cases of this (I fail to see how your second example is any better than the first one).

As for the differentiating between init and out, that'd just be increasing the concept count and will probably lead to confusion. The best would be probably to make out as writeonly because in most cases, the function would itself have the access to assigned value (except in cases where it initializes the parameter by moving some object into it).

But as @JohelEGP pointed out, this all should be in a different issue.

JohelEGP commented 1 year ago

Alternatively, consider using the most appropriate spelling for a given context. This might be useful if consistency isn't convincing enough, and to help find a middle ground. -- https://github.com/hsutter/cppfront/issues/231#issuecomment-1485952076

Apparently, foo as const is valid JavaScript, so here's an alternative.

    inout_func( x as in );
    (returning_func() as discarded);
    (returning_func() as unused);
JohelEGP commented 1 year ago

Herb's answer is at https://github.com/hsutter/cppfront/commit/45fb75e9ef357e0a09d5372bcbc738bf25447dfe#r107212029.


https://github.com/hsutter/cppfront/commit/45fb75e9ef357e0a09d5372bcbc738bf25447dfe#diff-de1d10bc50283c36fe4f14b907e2783d1f0250c55badcf3db168e84fbc91802fR37 added anonymous objects. I'm wondering, what exactly are the semantics of anonymous objects? It seems like you could accidentally do _ := f() instead of discard f(). So perhaps anonymous objects deserve a keyword too.

hsutter commented 1 year ago

I've skimmed the new comments on the thread, so please let me know if I inadvertently am not answering a point added above since last time:

Re "writeonly": There's nothing wrong with reading from an out parameter, as long as it's after you've set its value. So out isn't about "writeonly". I should make that clearer in the d0708 summary table that it's not only about 'give me an x I can write to'... the point is that I'll write its value first before doing anything else, and I am guaranteed to actually write to it. But after that it's okay to read the value (that this function just set!). For similar reasons, it's okay for a function to read the value of a forward parameter before it's forwarded.

Re "mutable": Yes, I probably should add a way to get the same effect as a Cpp1 "mutable" class member. Those are occasionally useful for instrumentation data, or the canonical example shown above of having a synchronization object (e.g., mutex, concurrent message queue) as a data member, which is fine because those are internally synchronized. (That's the short answer; I gave a talk on the longer answer, but that video is no longer hosted on Channel 9 and I can't find it on YouTube.)

Perhaps something like using "inout" here too, signifying the data member is always treated as though "inout":

ThreadsafeCounter: type = {
    inout m: std::mutex = ();  // hypothetical: "inout"(?) for Cpp1 "mutable"
    // ... other data

    get: (this) -> data = {
        _ := std::lock_guard(m);
        return data;
    }

    inc: (inout this) = {
        _ := std::lock_guard(m);
        modify(data);
    }
};

Re the original topics: Still thinking ...

filipsajdak commented 1 year ago

@hsutter Regarding writeonly, I think I need to collect all my thought together and share my doubts - as I think I did not communicate well on that.

filipsajdak commented 1 year ago

I will not continue writeonly topic here to leave a space for the original topic.

AbhinavK00 commented 1 year ago

Small chime in regarding mutable, maybe we can skip it for now and see how far cpp2 can go without it. If we get writeonly some day, then the current solution proposed by Herb would need to be tweaked to also allow reading from writeonly members.

JohelEGP commented 1 year ago

The need for this is resolved, see https://github.com/hsutter/cppfront/issues/305#issuecomment-1586580100.

JohelEGP commented 1 year ago

I might have an use case for marking an ignorable output on a function declaration.

Consider https://compiler-explorer.com/z/r6sq4nGM1, generated from https://cpp2.godbolt.org/z/K3v7xWeav.

The iterator::operator++ returns this. MSVC and GCC warn on the unused result of the compiler rewrite of the range for statement.

iterator: @struct @ordered type = {
  operator*: (this) -> i32 = 0;
  operator++: (this) -> forward iterator = this;
}
range: @struct type = {
  begin: (this) -> iterator = ();
  end: (this) -> iterator = ();
}
main: () = {
  for range() do (x) _ = x;
}
main.cpp2: In function 'int main()':
main.cpp2:10:31: error: ignoring return value of 'const iterator& iterator::operator++() const', declared with attribute 'nodiscard' [-Werror=unused-result]
   10 |   for range() do (x) _ = x;
      |                               ^
main.cpp2:3:22: note: declared here
    3 |   operator++: (this) -> forward iterator = this;
      |                      ^~~~~~~~
cc1plus: some warnings being treated as errors