CP.53 - Parameters to coroutines should not be passed by reference - seems wrong

isocpp / CppCoreGuidelines

The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++

http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines

Other

42.9k stars 5.44k forks source link

CP.53 - Parameters to coroutines should not be passed by reference - seems wrong #2050

Open nejati-deriv opened 1 year ago

nejati-deriv commented 1 year ago

CP.53: Parameters to coroutines should not be passed by reference

This is the issue that suggested rule #1806

I think this rule might be wrong and makes the code bases more bug-prone. It is quite common to manage the lifetime of coroutines within the scope that they are spawned and pass const or mutable reference to the child coroutines. That is what we can expect from structured concurrency.

For example, here we launch two tasks and wait for them to complete:

auto socket = asio::ip::tcp::socket{ executor };
co_await socket.async_connect(endpoint, asio::deferred);

auto receive_task = [](asio::ip::tcp::socket &socket) -> asio::awaitable< void >
{
    // ..
};

auto send_task = [](asio::ip::tcp::socket &socket) -> asio::awaitable< void >
{
    // ..
};

co_await wait_for_all(receive_task(socket), send_task(socket));

Here we spawn one coroutine per connection and wait for them to complete at the end of the scope:

asio::awaitable< void > handle_session(asio::ip::tcp::socket socket, Database & database);

auto database   = Database{};
auto task_group = TaskGroup{};
auto acceptor   = asio::ip::tcp::acceptor{ executor, endpoint };

for (;;)
{
    auto [ec, socket] = co_await acceptor.async_accept(asio::as_tuple(asio::deferred));
    if (ec)
        break;
    task_group.spawn(handle_session(std::move(socket), database));
}

co_await task_group.join();

Aside from these examples, every coroutine member function has an implicit reference to the instance.

So, if this guideline prohibits references in coroutine parameters and suggests allocating objects on the heap and passing them as shared_ptr, it would be against structured concurrency and can create a shared pointer spaghetti.

ccotter commented 1 year ago

Another example to consider (one that I think is simpler since there is no task_group-like object that manages the lifetimes of the spawned work) is

vector<LargeObject> generate_data();
task<int> process_data(const vector<LargeObject>& Input, int multiplier);

void logic() {
    vector<LargeObject> input = ...;
    // below are "safe"
    auto result1 = co_await process_data(input, 1);
    auto result2 = co_await process_data(input, 2);
    // below is still safe - temporary's lifetime exists for the entire full
    auto result2 = co_await process_data(generate_data(), 2); expression
    ...
}

Here, I believe we can actually statically guarantee there are no dangling references since the coroutine object returned by process_data is awaited in the same full expression (expressions which call the coroutine but do not co_await it in the same full expression cannot make that same guarantee). In theory, at the expense of more complicated enforcement section (and implementation in static analysis tooling), we could only flag calls to coroutines which accept a reference where the returned coroutine is not immediately awaited in the same full expression. This wouldn't apply in the task_group use case, since the returned coroutine object is then passed elsewhere.

The use case of creating an object on the coroutine frame (whether an IO-like handle, some expensive to copy data structure) then passing a reference in more than one immediately awaited coroutine calls is a common one in my codebase, and one I would expect is common more broadly.

MikeGitb commented 1 year ago

In theory, at the expense of more complicated enforcement section (and implementation in static analysis tooling), we could only flag calls to coroutines which accept a reference where the returned coroutine is not immediately awaited in the same full expression.

I don't think that gains you anything, as co_await does not mark the end of a coroutine. It just means the coroutine will run to the first/next suspension point. No one is guaranteeing you that it is the last suspension point, or that corourtine won't access the reference even after the last point.

Are you mixing up the semantics of auto result = co_await process_data(generate_data(), 2); with auto result = std::async(&my_non_coroutine, generate_data()).get();

ccotter commented 1 year ago

It just means the coroutine will run to the first/next suspension point. No one is guaranteeing you that it is the last suspension point, or that corourtine won't access the reference even after the last point.

True, I suppose in general, co_await some_coro() does not always mean the coroutine handle associated with some_coro will have completed and been destroyed when the associated awaiter resumes the calling coroutine (although I think it'd be surprising behavior, since some_coro() itself is a temporary in this expression, and having the coroutine handle outlive this expression seems like it's violating some other best practice). I was thinking specifically of something like the proposed std::lazy. A general way of stating this would be for coroutines whose coroutine handle lifetime is nested within the lifetime of the return object from the coroutine, co_await some_coro(foo) where the first parameter is a reference is safe. Is what I've said here true?

We'd need some way of annotating such types in order for analysis to detect these situations and for the guideline to permit such code. In any case, what I've proposed here doesn't have to be the solution, but I do think the guideline (while well intended) might become less useful if the recommendation is to pass everything by value (whether to the underlying object itself, or shared_ptr to said object).

MikeGitb commented 1 year ago

Ah, I was probably mistaken then. I completely ignored that the coroutine handle is a temporary here. Sorry for the noise.

nejati-deriv commented 1 year ago

@ccotter Do you think there might be a way to check if the child-coroutine lifetime is tied to the parent?

ccotter commented 1 year ago

@ccotter Do you think there might be a way to check if the child-coroutine lifetime is tied to the parent?

I don't think so, at least not in an easy or 100% general way. In my example of awaiting a temporary coroutine call (co_await foo(Temporary{})), it'd be interesting if we could formulate a separate rule requiring the underlying coroutine handle to have a lifetime contained within the co_await expression, although this couldn't be statically verifiable always, and only sometimes but not easily. Perhaps if/when we get a set of standardized coroutine types, we could have a guideline that safely says co_await foo(Temporary{}) is OK if foo is of type std::lazy etc (I'm glossing over the fact that co_await really operates on an awaitable).

In your task_group use case, which I think is a very valid and potentially common use case, this would be harder to come up with a rule (compared to my co_await foo(Temporary{}) case) since the lifetime of the coroutine object is not contained within a single expression. In fact, even in your case, task_group.spawn(foo(Temporary{})) wouldn't be safe. Your case is only safe if the capture is of an lvalue, and the lvalue's lifetime is greater than that of the task_group. This is harder to formulate in a guideline I think. All that said, writing code such as your example is still code I think would be useful to write without having to resort to shared_ptr.

ccotter commented 1 year ago

After thinking about this a bit more, I realized that it would not be possible to statically check what I suggest above (only flag calls to coroutines that are not immediately coawaited) since it is not possible to check whether the callee in a call expression is a function or coroutine since the function body may not be available.