Open ParadoxSpiral opened 5 years ago
This is similar to https://github.com/rust-lang/rust/issues/23818, where having more control over the buffering/flushing behavior of stdio/stdin would be helpfull.
So I've been thinking about the same thing while working on https://github.com/rust-lang/rust/pull/58454.
I do wonder how useful it they would be though.
For example these types would be useful for CLI applications that write a lot of data at once without it getting unnecessarily flushed.
I don't think Stdout
flushes unnecessary? It should flush just as often or less than StdoutRaw
, because Stdout
lets large writes (greater than 8k) pass through directly. And Stderr
isn't buffered at all.
Stdout
does have an interesting comment:
pub struct Stdout {
// FIXME: this should be LineWriter or BufWriter depending on the state of
// stdout (tty or not). Note that if this is not line buffered it
// should also flush-on-panic or some form of flush-on-abort.
inner: Arc<ReentrantMutex<RefCell<LineWriter<Maybe<StdoutRaw>>>>>,
}
Working on this would change flushing behavior for piped buffered stdout, but is not at all relevant for StdoutRaw
.
Stdout
and Stderr
do synchronize writes across threads however. I suppose that is why the panic code in the standard library writes to StderrRaw
directly.
Another point, the Windows stdio implementation turns out to be quite tricky because it converts from UTF-8 to UTF-16 when writing to the console, but accepts arbitrary bytes when writing to a pipe. Ideally StdinRaw
holds some state to deal with sliced UTF-16 buffers. And the buffered LineWriter
in Stdout
helps to give valid UTF-8 slices, because it slices at sentences, preventing incomplete UTF-8 code points.
Finally the non-raw types have a Maybe
in-between, to gracefully handle the case where stdin/stdout/stderr is not available, see RFC 1014.
Some advantages the *Raw
types could have:
StderrRaw
does not do synchronization, I suppose there are cases where that is necessary to prevent deadlocks.StdoutRaw
will write everything out directly, no need for flushing. Bad for performance, and code that gradually builds up a line or message can give very messy output when multi-threaded. Still it can be used well and is desired by some (https://github.com/rust-lang/rust/issues/23818).A single synchronized Stdin
with its own buffer seems pretty solid to me. I see no real advantage in StdinRaw
.
I don't think
Stdout
flushes unnecessary? It should flush just as often or less thanStdoutRaw
, becauseStdout
lets large writes (greater than 8k) pass through directly. AndStderr
isn't buffered at all.
Oh, I wasn't aware that LineWriter
hands big writes through. However it would still be useful if you could wrap StdoutRaw
in a BufWriter
that you have more control over without also still having the LineWriter
beneath.
Some advantages the
*Raw
types could have:
A slight advantage is that you can more easily check if any of them are not present, since the raw functions return Result
s.
A single synchronized
Stdin
with its own buffer seems pretty solid to me. I see no real advantage inStdinRaw
.
I agree a StdinRaw
is probably not desireable.
A slight advantage is that you can more easily check if any of them are not present, since the raw functions return
Result
s.
Sorry, just made a PR to that makes them no longer return a Result
https://github.com/rust-lang/rust/pull/58768. But it didn't work, the raw types on all systems already only return Ok
.
I am still trying to make up my mind whether writing up a pre-RFC or PR to expose the *Raw
types brings something useful.
StdoutRaw
can in some sense break the synchronization promise of Stdout
: when Stdout
is locked by one thread it can write multiple lines without having lines from another thread 'interrupt'. So any custom implementation that wraps StdoutRaw
that want to play nice with the standard library should lock Stdout
before using StdoutRaw
. That seems to make the idea of using different synchronization primitives not really interesting anymore.
Oh, I wasn't aware that
LineWriter
hands big writes through. However it would still be useful if you could wrapStdoutRaw
in aBufWriter
that you have more control over without also still having theLineWriter
beneath.
I am preparing a PR that switches the buffering mode between LineWriter
and Bufwriter
depending on whether a terminal is connected, but expect it to not land easily... Would that fit your needs, or some method on Stdout
that gives more control over the buffering?
One big question is should we have StdoutRaw
and StdinRaw
for Windows consoles that allow the user to read and write [u16]
directly? Should we also allow the user to write arbitrary [u8]
bytes through the narrow codepage?
I am preparing a PR that switches the buffering mode between
LineWriter
andBufwriter
depending on whether a terminal is connected, but expect it to not land easily... Would that fit your needs, or some method onStdout
that gives more control over the buffering?
I would like to be able to set the capacity of the inner BufWriter
.
After further investigation it looks like writing directly to the underlying File
with custom buffering is much faster than anything that I've been able to figure out for using stdout()
. I'm playing with this right now; see http://github.com/BartMassey/rust-nonstdio for a very early preview of a thing. The Background section of the README
has some information that is relevant here.
This led to an otherwise-unnecessary use of std::io::copy
for me. I have data that I want to pipe to a command. This data may come from stdin, or it may come from some other Read. I can't just call .stdin()
on this with a handle that comes from io::stdin()
if I've read anything from that handle, because the first read, even if read_exact on 1 byte, will have read 8K and the remainder of the 8K block will be discarded.
This is an unfortunate data loss bug that is entirely non-obvious in the library and not blocked by the type system.
In my application I need to forward I/O from a serial port to stdio, and the serial port is an interactive console where the remote device echoes the characters back if they should be printed on the terminal, and controls various other aspects of the terminal. (Come to think of it, It doesn't have to be a serial port, another similar example with this behavior is an SSH client)
Line-buffered stdin simply does not work for this case; action needs to be taken for every input byte, not for every line.
Having line-buffered stdout is also an unnecessary performance overhead when outputting binary data. First the code scans for newlines to only partially write the bytes to stdout and copy the remaining bytes into a buffer, only for me to flush the remaining bytes out.
One syscall too many for every write that I do and unnecessary buffer copying.
Created a proposal for this: https://github.com/rust-lang/libs-team/issues/148
If anyone else came across this issue while looking for a workaround, this is what I'm using right now.
use std::{fs::File, io};
#[cfg(unix)]
pub fn stdout_raw() -> File {
use std::os::fd::{AsRawFd, FromRawFd};
let stdout = io::stdout();
let raw_fd = stdout.as_raw_fd(); // or just use `1`
unsafe { File::from_raw_fd(raw_fd) }
}
#[cfg(windows)]
pub fn stdout_raw() -> File {
use std::os::windows::io::{AsRawHandle, FromRawHandle};
let stdout = io::stdout();
let raw_handle = stdout.as_raw_handle();
unsafe { File::from_raw_handle(raw_handle) }
}
#[cfg(test)]
mod test {
use super::*;
use std::io::{self, Write};
#[test]
fn rawwwwww() -> io::Result<()> {
let mut stdout = stdout_raw();
stdout.write_all(b"This stdout... is RAWWWWWW!!!")?;
Ok(())
}
}
In Filespooler, for Unix, I am using:
/// stdin is buffered by default on Rust, and you can't change it. Since
/// we need to precisely read the header before letting a subprocess
/// handle the payload in stdin-process, we have to use trickery. Bleh.
///
/// Take care not to let this value drop before spawning, because that would
/// cause stdin to be closed.
///
/// See: https://github.com/rust-lang/rust/issues/97855
pub fn get_unbuffered_stdin() -> File {
let s = stdin();
let locked = s.lock();
let file = unsafe { File::from_raw_fd(locked.as_raw_fd()) };
file
}
FWIW
AFAICT, both of these will close the actual stdin/stdout FD once the returned file gets dropped. This is probably not intended.
Additionally, the solution posted by @jgoerzen looks like it's trying to lock stdin, but the lock is immediately dropped after the function is returned.
Definitely wouldn't recommend using my snippet for any serious/larger code, it's just suggested as a quick and dirty workaround for trivial but IO intensive scenario. In my case I was solving an algorithm problem when I encountered this.
Currently there is not easy/obvious way to get an unbuffered Stdout/err/in. The types do exist in stdio, however they are not public for reasons not noted.
For example these types would be useful for CLI applications that write a lot of data at once without it getting unnecessarily flushed.
One can use platform specific extensions such as
from_raw_fd
on unix, andfrom _raw_handle
on windows as a workaround.