hsutter / cppfront

A personal experimental C++ Syntax 2 -> Syntax 1 compiler
Other
5.24k stars 224 forks source link

[BUG] Variable declarations can refer to themselves #1028

Closed DyXel closed 3 months ago

DyXel commented 3 months ago

Describe the bug A problem as old as C itself: You are able declare a value and "initialize it" with itself, which is unsafe (and confusing).

To Reproduce

main: () -> int = { x: int = x; return x; }

Godbolt reproducer: https://cpp2.godbolt.org/z/hxros3zM9

Additional context We are fortunate that most compilers diagnose this, for example, Clang says:

warning: variable 'x' is uninitialized when used within its own initialization [-Wuninitialized]

However, IMO a warning is not enough, this should be an error because:

gregmarr commented 3 months ago

A presentation I saw today actually used a variant of this that was technically legal. I'm not saying that this is good code, but it is something that we might need to handle. I'm writing this from memory, but I think this is accurate.

T func(T *x) { <initialize *x>; return *x; }
T x = func(&x);

Maybe it could be written like this:

func : (out x : int) -> int = {
    x = 0;
    return x;
}

main :() = {
    x : int = func(x&);
}
hsutter commented 3 months ago

Thanks! Done. But I hesitated, let me tell you why... 🗨️

In general, because I have limited time and I want to focus on adding new value beyond Cpp1, I try to avoid doing extra work to add checks for things that the Cpp1 compilers already generally diagnose consistently and well (as a warning or error), and this is one that it looks like all the major compilers do flag with a nice readable warning.

So that's why I almost didn't do this... but because this is about initialization safety and I thought 'well, maybe it's not too hard,' I took a look and it worked out pretty well. Note that I had to add a little logic to not flag code like f: int = t().f(); in existing test cases.

But this gave me a chance to explain why I'll say No to some requests like this, and a case to point back to in the future where I did it anyway as proof that I don't always just say No. 😁

hsutter commented 3 months ago

Oh, and the original code at top now gets this:

demo.cpp2(1,30): error: local variable x cannot be used in its own initializer
DyXel commented 3 months ago

Hey, thanks for handling this, and I totally get you (I promise I won't get mad if you say No in the future 😛). The real reason I opened the bug is because I was reminded of Bjarne's talk on safety where he said: "diagnostics are no longer enough, we need enforcement of some kind", and unfortunately, there's a lot of places where warnings are not enforced, so it seems to me like a good place for cppfront to step in and prevent you from doing that mistake ever again.

hsutter commented 3 months ago

No worries. Note that this is not necessarily without minor downsides... one use case this prevents (and we can carve out to narrowly allow if there's demand) is taking the address of the variable to register it with something that's also initializing it. I doubt that there are important existing APIs that do that, but we'll see if people encounter any...

gregmarr commented 3 months ago

taking the address of the variable to register it with something that's also initializing it.

I think you can cover that with "you can pass it as a function's out parameter". Hm, maybe not. The "address of" may not work there, as you'd need to unpack the cpp2::out struct and get the address from there.

hsutter commented 3 months ago

I thought about that, because of your example of this style... changing int to X, and adding forward to avoid a copy/move:

func : (out xx : X) -> forward X = {
    xx = 0;           // construct x
    return xx;
}

main :() = {
    x : X = func(x&); // construct x... again, and from itself
}

The principle is that a function with an out parameter is a (delegating) constructor. So at first that seemed to make sense. We're constructing, right?

But then I realized that this would be double construction -- x is being constructed in func, and then again in main (in fact, it's constructed from itself). So I couldn't think of a case where it made sense, and just flagged all uses of x in its own initializer. But we can always relax that if we find a use case.

Using a function with an out parameter as a constructor already works by taking advantage of Cpp2's lazy initialization feature, which naturally is outside the declaration:

func : (out xx : X) = {
    xx = 0;           // construct x
}

main :() = {
    x : X;
    func(x&);         // constructs x
}
gregmarr commented 3 months ago

I've contacted you offline about the source of this sample.