Improve documentation of `tokio::io::stdout` regarding interleaved writes

liss-h commented 3 months ago

Version tokio v1.38.0

Platform Linux strix 6.9.6-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Jun 21 15:48:21 UTC 2024 x86_64 GNU/Linux

Description Hi there, I ran into an issue with tokio::io::stdout that would be really good to put into the documentation to save people from encountering that issue.

Setup: two threads, one of them produces byte chunks the other one consumes them. They are sent through a channel.

The consuming thread does the following:

while let Some(chunk) = rx.recv().await {
    tokio::io::stdout()
        .write_all(&chunk)
        .await
        .context("Unable to write chunk to stdout")?;
}

Now this doesn't look particularly interesting, however, this code produces mangled output on stdout. I was convinced that this must be caused by the threads producing the bytes (because of unsafe usage), but it is not.

It is caused by the repeated calls to tokio::io::stdout() in the loop. If you extract it out of the loop, like so:

let mut stdout = tokio::io::stdout();

while let Some(chunk) = rx.recv().await {
    stdout
        .write_all(&chunk)
        .await
        .context("Unable to write chunk to stdout")?;
}

it works properly.

Now, the documentation of tokio::io::stdout() states that output may interleave when multiple threads are writing to it in parallel and I was aware of that. But by just looking at it, there is only a single thread writing to stdout in this example. I have not deeply looked into the implementation of tokio::io::stdout but I guess something of this sort must be going on there. An explaination what happens here would be appreciated, just out of curiousity :)

It would be really good if you could add a paragraph to the documentation that states that this is invalid (maybe with an explaination as to why).

I get that repeated calls to tokio::io::stdout() are kinda weird and non-optimal, I just didn't think very deeply about it when I wrote that, and since there is no .lock() function that would encourage only calling it once, that just happened I guess.

mox692 commented 3 months ago

this code produces mangled output on stdout

I'm assuming you mean that messages may not appear in order in stdout, and I think such things could happen.

Tokio's stdout operation is performed on a dedicated blocking thread, and if multiple stdio instances are created in each loop and perform write operations, each write operation is performed in its own thread, which may be in any order.

On the other hand, when a write is performed to a single stdio instance within a loop, the state is properly updated and the order would be preserved.

And I agree, it would be good to have a description that mentions this behavior.

liss-h commented 3 months ago

Yes, I meant wrong order. Follow up question if you don't mind: why does the await on write_all not prevent this. Shouldnt it wait until all bytes are written? And therefore prevent the next write (next iteration) from even being queued

mox692 commented 3 months ago

write_all itself doesn't guarantee that the write is complete. (Looking at the implementation of write_all, it returns Poll::Ready immediately after spawning the thread.)

Calling flush explicitly in a loop should work well even if instances are created each time.

while let Some(chunk) = rx.recv().await {
    let mut stdout = tokio::io::stdout();
    stdout
        .write_all(&chunk)
        .await
        .context("Unable to write chunk to stdout")?;

    stdout.flush().await.unwrap();
}

liss-h commented 3 months ago

Thanks for explaining, makes sense. I didn't think about the fact that it might be buffered 😅

tokio-rs / tokio

Improve documentation of `tokio::io::stdout` regarding interleaved writes #6672