rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.97k stars 12.53k forks source link

Expose raw Stdout/err/in #58326

Open ParadoxSpiral opened 5 years ago

ParadoxSpiral commented 5 years ago

Currently there is not easy/obvious way to get an unbuffered Stdout/err/in. The types do exist in stdio, however they are not public for reasons not noted.

For example these types would be useful for CLI applications that write a lot of data at once without it getting unnecessarily flushed.

One can use platform specific extensions such as from_raw_fd on unix, and from _raw_handle on windows as a workaround.

josephlr commented 5 years ago

This is similar to https://github.com/rust-lang/rust/issues/23818, where having more control over the buffering/flushing behavior of stdio/stdin would be helpfull.

pitdicker commented 5 years ago

So I've been thinking about the same thing while working on https://github.com/rust-lang/rust/pull/58454.

I do wonder how useful it they would be though.

For example these types would be useful for CLI applications that write a lot of data at once without it getting unnecessarily flushed.

I don't think Stdout flushes unnecessary? It should flush just as often or less than StdoutRaw, because Stdout lets large writes (greater than 8k) pass through directly. And Stderr isn't buffered at all.

Stdout does have an interesting comment:

pub struct Stdout {
    // FIXME: this should be LineWriter or BufWriter depending on the state of
    //        stdout (tty or not). Note that if this is not line buffered it
    //        should also flush-on-panic or some form of flush-on-abort.
    inner: Arc<ReentrantMutex<RefCell<LineWriter<Maybe<StdoutRaw>>>>>,
}

Working on this would change flushing behavior for piped buffered stdout, but is not at all relevant for StdoutRaw.

Stdout and Stderr do synchronize writes across threads however. I suppose that is why the panic code in the standard library writes to StderrRaw directly.

Another point, the Windows stdio implementation turns out to be quite tricky because it converts from UTF-8 to UTF-16 when writing to the console, but accepts arbitrary bytes when writing to a pipe. Ideally StdinRaw holds some state to deal with sliced UTF-16 buffers. And the buffered LineWriter in Stdout helps to give valid UTF-8 slices, because it slices at sentences, preventing incomplete UTF-8 code points.

Finally the non-raw types have a Maybe in-between, to gracefully handle the case where stdin/stdout/stderr is not available, see RFC 1014.

Some advantages the *Raw types could have:

A single synchronized Stdin with its own buffer seems pretty solid to me. I see no real advantage in StdinRaw.

ParadoxSpiral commented 5 years ago

I don't think Stdout flushes unnecessary? It should flush just as often or less than StdoutRaw, because Stdout lets large writes (greater than 8k) pass through directly. And Stderr isn't buffered at all.

Oh, I wasn't aware that LineWriter hands big writes through. However it would still be useful if you could wrap StdoutRaw in a BufWriter that you have more control over without also still having the LineWriter beneath.

Some advantages the *Raw types could have:

A slight advantage is that you can more easily check if any of them are not present, since the raw functions return Results.

A single synchronized Stdin with its own buffer seems pretty solid to me. I see no real advantage in StdinRaw.

I agree a StdinRaw is probably not desireable.

pitdicker commented 5 years ago

A slight advantage is that you can more easily check if any of them are not present, since the raw functions return Results.

Sorry, just made a PR to that makes them no longer return a Result https://github.com/rust-lang/rust/pull/58768. But it didn't work, the raw types on all systems already only return Ok.


I am still trying to make up my mind whether writing up a pre-RFC or PR to expose the *Raw types brings something useful.

StdoutRaw can in some sense break the synchronization promise of Stdout: when Stdout is locked by one thread it can write multiple lines without having lines from another thread 'interrupt'. So any custom implementation that wraps StdoutRaw that want to play nice with the standard library should lock Stdout before using StdoutRaw. That seems to make the idea of using different synchronization primitives not really interesting anymore.

Oh, I wasn't aware that LineWriter hands big writes through. However it would still be useful if you could wrap StdoutRaw in a BufWriter that you have more control over without also still having the LineWriter beneath.

I am preparing a PR that switches the buffering mode between LineWriter and Bufwriter depending on whether a terminal is connected, but expect it to not land easily... Would that fit your needs, or some method on Stdout that gives more control over the buffering?

retep998 commented 5 years ago

One big question is should we have StdoutRaw and StdinRaw for Windows consoles that allow the user to read and write [u16] directly? Should we also allow the user to write arbitrary [u8] bytes through the narrow codepage?

ParadoxSpiral commented 5 years ago

I am preparing a PR that switches the buffering mode between LineWriter and Bufwriter depending on whether a terminal is connected, but expect it to not land easily... Would that fit your needs, or some method on Stdout that gives more control over the buffering?

I would like to be able to set the capacity of the inner BufWriter.

BartMassey commented 4 years ago

I wrote up some relevant stuff on Reddit here. My repo is here if you want to play with it. It all amounts to a pretty good argument that exposing StdoutRaw isn't going to buy you much performance in typical cases, I think.

BartMassey commented 4 years ago

After further investigation it looks like writing directly to the underlying File with custom buffering is much faster than anything that I've been able to figure out for using stdout(). I'm playing with this right now; see http://github.com/BartMassey/rust-nonstdio for a very early preview of a thing. The Background section of the README has some information that is relevant here.

jgoerzen commented 2 years ago

This led to an otherwise-unnecessary use of std::io::copy for me. I have data that I want to pipe to a command. This data may come from stdin, or it may come from some other Read. I can't just call .stdin() on this with a handle that comes from io::stdin() if I've read anything from that handle, because the first read, even if read_exact on 1 byte, will have read 8K and the remainder of the 8K block will be discarded.

This is an unfortunate data loss bug that is entirely non-obvious in the library and not blocked by the type system.

agausmann commented 2 years ago

In my application I need to forward I/O from a serial port to stdio, and the serial port is an interactive console where the remote device echoes the characters back if they should be printed on the terminal, and controls various other aspects of the terminal. (Come to think of it, It doesn't have to be a serial port, another similar example with this behavior is an SSH client)

Line-buffered stdin simply does not work for this case; action needs to be taken for every input byte, not for every line.

tbu- commented 1 year ago

Having line-buffered stdout is also an unnecessary performance overhead when outputting binary data. First the code scans for newlines to only partially write the bytes to stdout and copy the remaining bytes into a buffer, only for me to flush the remaining bytes out.

One syscall too many for every write that I do and unnecessary buffer copying.

SUPERCILEX commented 1 year ago

Created a proposal for this: https://github.com/rust-lang/libs-team/issues/148

WieeRd commented 10 months ago

If anyone else came across this issue while looking for a workaround, this is what I'm using right now.

use std::{fs::File, io};

#[cfg(unix)]
pub fn stdout_raw() -> File {
    use std::os::fd::{AsRawFd, FromRawFd};

    let stdout = io::stdout();
    let raw_fd = stdout.as_raw_fd(); // or just use `1`
    unsafe { File::from_raw_fd(raw_fd) }
}

#[cfg(windows)]
pub fn stdout_raw() -> File {
    use std::os::windows::io::{AsRawHandle, FromRawHandle};

    let stdout = io::stdout();
    let raw_handle = stdout.as_raw_handle();
    unsafe { File::from_raw_handle(raw_handle) }
}

#[cfg(test)]
mod test {
    use super::*;
    use std::io::{self, Write};

    #[test]
    fn rawwwwww() -> io::Result<()> {
        let mut stdout = stdout_raw();
        stdout.write_all(b"This stdout... is RAWWWWWW!!!")?;

        Ok(())
    }
}
jgoerzen commented 10 months ago

In Filespooler, for Unix, I am using:

/// stdin is buffered by default on Rust, and you can't change it.  Since
/// we need to precisely read the header before letting a subprocess
/// handle the payload in stdin-process, we have to use trickery.  Bleh.
///
/// Take care not to let this value drop before spawning, because that would
/// cause stdin to be closed.
///
/// See: https://github.com/rust-lang/rust/issues/97855
pub fn get_unbuffered_stdin() -> File {
    let s = stdin();
    let locked = s.lock();
    let file = unsafe { File::from_raw_fd(locked.as_raw_fd()) };
    file
}

FWIW

tbu- commented 10 months ago

AFAICT, both of these will close the actual stdin/stdout FD once the returned file gets dropped. This is probably not intended.

Additionally, the solution posted by @jgoerzen looks like it's trying to lock stdin, but the lock is immediately dropped after the function is returned.

WieeRd commented 10 months ago

Definitely wouldn't recommend using my snippet for any serious/larger code, it's just suggested as a quick and dirty workaround for trivial but IO intensive scenario. In my case I was solving an algorithm problem when I encountered this.