Open apolukhin opened 4 years ago
Sorry about the delay. I had a quick patch but it seems like Erik addressed this in https://reviews.llvm.org/D74094. I see that this has been reverted due to a ppc failure. I locally re-applied the patch and I see the function using only 16424 bytes.
Seems like Francis has a patch?
The issue affects users of the stackful coroutines by adding very significant memory overhead (each of the thousands coroutines becomes multiple times bigger).
We'd appreciate a fix for 10.0.
That should be a fairly localized change that doesn't need a solution to the more general issue with temporaries. Did you have a patch for it, or did you just try adding lifetime markers to the IR?
Adding lifetime markers for the temp from EmitAnyExprToTemp(E) in EmitCallArg brings the stack size to 16424.
Yeah, I think it's a known issue that we don't emit them for temporaries.
Yes, lifetime markers are missing.
I've revived https://reviews.llvm.org/D74094; let's see what issues with that approach may persist, if any.
relanded as https://reviews.llvm.org/rGe698695fbbf62e6676f8907665187f2d2c4d814b in clang-18.
The latest C23 draft § 6.2.4 Storage durations of objects
8 A non-lvalue expression with structure or union type, where the structure or union contains a member with array type (including, recursively, members of all contained structures and unions) refers to an object with automatic storage duration and temporary lifetime. 39) Its lifetime begins when the expression is evaluated and its initial value is the value of the expression. Its lifetime ends when the evaluation of the containing full expression ends. Any attempt to modify an object with temporary lifetime results in undefined behavior. An object with temporary lifetime behaves as if it were declared with the type of its value for the purposes of effective type. Such an object need not have a unique address.
Speaking more with @AaronBallman , Aaron found https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1285.htm introduced the current wording into c11. https://wiki.sei.cmu.edu/confluence/display/c/EXP35-C.+Do+not+modify+objects+with+temporary+lifetime talks more about this.
I'm kind of hung up on the specific wording: where the structure or union contains a member with array type
. Why would that matter if the aggregate contains an array or not?
EDIT: sorry, I should file a new bug about C. https://web.archive.org/web/20231010182056/https://yarchive.net/comp/struct_return.html talks more about this in regards to arrays within the returned struct.
The issue affects users of the stackful coroutines by adding very significant memory overhead (each of the thousands coroutines becomes multiple times bigger).
I noticed this today when using coroutines. This function:
Coro<Status> SimplestCoroWithAwait(CoroContext& cx) {
co_return co_await SimplestCoro(cx);
}
allocates 88 bytes for storage of its stack, but this one:
Coro<Status> SimplestCoroWithTwoAwaits(CoroContext& cx) {
co_await SimplestCoro(cx);
co_await SimplestCoro(cx);
co_await SimplestCoro(cx);
co_await SimplestCoro(cx);
co_await SimplestCoro(cx);
co_await SimplestCoro(cx);
co_await SimplestCoro(cx);
co_await SimplestCoro(cx);
co_await SimplestCoro(cx);
co_await SimplestCoro(cx);
co_return co_await SimplestCoro(cx);
}
allocates 408 bytes for its coroutine stack.
In 2019 a similar issue was fixed in the Rust compiler resulting in significant space savings: https://github.com/rust-lang/rust/pull/60187
I'm interested in helping contribute to a fix here. Is https://reviews.llvm.org/D74094 the latest discussion on this?
I'm interested in helping contribute to a fix here. Is https://reviews.llvm.org/D74094 the latest discussion on this?
That's the latest discussion I'm aware of.
Extended Description
Consider the following example:
With
-std=c++11 -O2
flags Clang uses 32768 bytes of stack, GCC uses 16392 bytes.Note that adding additional
t.Extend(static_cast<test&&>(t));
results in stack usage growth for Clang, while GCC keeps using only 16392 bytes.Godbolt playground: https://godbolt.org/z/MdaNfa