hsutter / cppfront

A personal experimental C++ Syntax 2 -> Syntax 1 compiler
Other
5.24k stars 224 forks source link

[BUG] Initialization vs assignment in a loop #1049

Closed ntrel closed 1 week ago

ntrel commented 3 months ago

To Reproduce

main: () =
{
    i := 0;
    p: std::unique_ptr<int>;
    while i < 3 next i++ {
        std::cout << i << "\n";
        p = new<int>(i);
        std::cout << p* << "\n";
    }
}

The p = new<int> line generates a call to p.construct, which works for the first iteration to initialize p. Then on the second iteration, an assignment was intended, but p.construct is called again, which causes a contract violation.

0
0
1
Contract violation
terminate called without an active exception
Aborted (core dumped)

git cppfront, g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

JohelEGP commented 3 months ago

This should be rejected, just like https://cpp2.godbolt.org/z/8ssxKs4YT is:

main: () = {
  p: std::unique_ptr<int>;
  if true {
    p = new<int>(i);
  }
}
main.cpp2...
main.cpp2(2,3): error: local variable p must be initialized on both branches or neither branch
main.cpp2(3,3): error: "if" initializes p on:
  branch starting at line 3
but not on:
  implicit else branch
  ==> program violates initialization safety guarantee - see previous errors
ntrel commented 3 months ago

This should be rejected

For consistency, yes. But in both cases it would be nice to only error if the variable is actually used after the branch. If it is only used in the branch where it is initialized, that could be allowed & would be useful e.g. when a pointer (declared in the same scope as the variable) is set to point at the variable after the variable is initialized. BTW the main readme mentions P1179, perhaps that will allow this and #440 as well.

hsutter commented 1 week ago

Yes, I agree this is a bug, lazy initialization shouldn't be allowed in a loop (loops can be entered zero times).

hsutter commented 1 week ago

it would be nice to only error if the variable is actually used after the branch.

It would be possible to make this work, but wouldn't that be equivalent to suppressing an unused variable warning?

hsutter commented 1 week ago

OK, fixed.

The following errors are now diagnosed:


error1: () = {
    i: int;
    while true {
        i = 42;     // ERROR: can't initialize i in a loop
    }
    i = 42;
}

error2: () = {
    i: int;
    if true {
        while true {
            i = 42;     // ERROR: can't initialize i in a loop
        }
        i = 42;
    }
    else {
        i = 42;
    }
    i = 42;
}

And the following is allowed:


ok: () = {
    i: int;
    if true {
        i = 42;
        while true {    // OK: in-branch loop is after initialization
            i = 42;
        }
    }
    else {
        i = 42;
    }
    i = 42;
}

Also, uses like this one already in reflect.h2 continue to work:

    protected parse_statement: ( /*...*/ )
        -> (ret: std::unique_ptr<statement_node>)  // LAZILY INITIALIZED
    = {
        /*... code that doesn't mention ret ... */

        if  /*...*/ {
            while /*... code that doesn't mention ret ... */ {  // OK: this loop doesn't matter
                /*... code that doesn't mention ret ... */
            }
        }

        /*... code that doesn't mention ret ... */

        ret = parser.parse_one_declaration(  // OK: definite first use of ret is a construction
                tokens*.get_map().begin()*.second,
                generated_tokens*
              );
        /*...*/
    }
jcanizales commented 1 week ago

So this is invalid C++2?

error1: () = {
    i: int;
    while true {
        s = input_from_user();  // string
        if s.is_int() {
            i = s.to_int();     // ERROR: can't initialize i in a loop??
            break;
        }
        // ask again
    }
    // Is this point flagged as an error too?
}
jcanizales commented 1 week ago

Oh, is the reasoning here that the user can initialize i where it's declared to some default value, and that's easier than trying to distinguish between construction and assignment in a loop?

hsutter commented 1 week ago

So this is invalid C++2? [... example where line 6 is:] i = s.to_int(); // ERROR: can't initialize i in a loop??

Correct, and is now diagnosed with the above recent commit:

test.cpp2(6,13): error: local variable i cannot be initialized inside a loop
  ==> program violates initialization safety guarantee - see previous errors

// Is this point flagged as an error too?

No because you get the earlier error. But if you had code that didn't have the error but then didn't use a the variable you would get an error from the Cpp1 compiler if "unused variable" warnings are on.

that's easier than trying to distinguish between construction and assignment in a loop?

The latter isn't just harder, I think it's not possible for for or while (but possibly could work for do).

The primary reason is that a for or while loop could execute zero times, so it might never have an opportunity to initialize at all.

If somehow we guaranteed at-least-once loop semantics (such as do does, or by inventing a for_at_least_once and while_at_least_once or similar), we could allow initialization inside a loop at the cost of generating an additional first_iteration local variable and then generating { auto&& __rhs = s.to_it(); if (first_iteration) { i.construct(CPP2_FORWARD(__rhs)); } else { i = CPP2_FORWARD(__rhs); }. That's possible but I haven't found a reason to implement that flexibility. Plus it costs 0-2 extra variables (modulo optimizations) and a branch, though that's less of a concern because you would pay for the overhead only if you use it.

ntrel commented 1 week ago

It would be possible to make this work, but wouldn't that be equivalent to suppressing an unused variable warning?

Not if it is initialized and used in the branch. So why not declare the variable in the branch? The example shows why - though in real code the type of the variable might be more complex - some complicated template instantiation.

main: () = {
  i: int;
  if e1 {
      // other code
      if e2 {
          // initialize and use i
      }
      // no use of i
  }
  else if e2 {
    // initialize and use i
  }
  // no use of i
}
hsutter commented 1 week ago

Not if it is initialized and used in the branch. So why not declare the variable in the branch?

I agree with both parts, including "why not declare the variable in the branch?" That is, in a case like this why wouldn't it be better to write it as follows?

main: () = {
    if e1 {
        // other code
        if e2 {
            // initialize and use i
            i := 42; std::cout << i;
        }
        // no use of i
    }
    else if e2 {
        // initialize and use i
        i := 42; std::cout << i;
    }
    // no use of i
}

Now it's correct by construction; you can't make the mistake of using i later because its name isn't even in scope.

The purpose of declaring a local in a larger scope is so that it can be used later in the scope (in both Cpp2 and Cpp1), and allowing it to have no initializer allows initializing it in a nested branch scope first including to use different constructors etc. (in Cpp2). If it's only going to be used in the branch scope, shouldn't it be declared there?

jcanizales commented 6 days ago

in real code the type of the variable might be more complex - some complicated template instantiation

Well you can alias that type and still declare the variables in the innermost scope.

hsutter commented 6 days ago

in real code the type of the variable might be more complex - some complicated template instantiation

Well you can alias that type and still declare the variables in the innermost scope.

I agree with @jcanizales but I now see I had missed @ntrel 's point originally... thanks Jorge for fixing my reply with a better shorter one, and my apologies Nick that I missed your clearly stated point the first time!