rust-cli / rexpect

.github/workflows/ci.yml
https://docs.rs/rexpect
MIT License
328 stars 56 forks source link

Unicode output does not roundtrip #105

Open lenianiva opened 1 year ago

lenianiva commented 1 year ago

When I put the unicode character "∀" into cat, the output doesn't roundtrip:

Input:
∀
226 136 128 
Output:
€ˆ
195 162 194 136 194 128 

Code:

use rexpect::spawn;
use rexpect::error::*;

fn display(s: &str)
{
    println!("{}", s);
    for b in s.as_bytes()
    {
        print!("{} ", b);
    }
    println!("");
}
fn repl() -> Result<(), Error>
{
    let mut p = spawn("cat", Some(1000))?;

    let ex: String = "∀".to_string();
    p.send_line(&ex)?;
    let line = p.read_line()?;

    println!("Input:");
    display(&ex);
    println!("Output:");
    display(&line);
    Ok(())
}
fn main()
{
    repl().unwrap_or_else(|e| panic!("ftp job failed with {}", e));
}
lenianiva commented 1 year ago

Seems to be a problem with the reader since this works with no problems:

    let output = std::process::Command::new("echo").arg("∀").output().expect("1");
    let l = std::str::from_utf8(&output.stdout).expect("2");
    println!("echo: {}", l);
lenianiva commented 1 year ago

I dug into this a bit more and I think the problem is with NBReader. The following test fails when put into reader.rs:

    #[test]
    fn test_expect_unicode() {
        let f = io::Cursor::new("∀ melon\r\n");
        let mut r = NBReader::new(f, None);
        assert_eq!(
            ("∀ melon".to_string(), "\r\n".to_string()),
            r.read_until(&ReadUntil::String("\r\n".to_string()))
                .expect("cannot read line")
        );
        // check for EOF
        match r.read_until(&ReadUntil::NBytes(10)) {
            Ok(_) => panic!(),
            Err(Error::EOF { .. }) => {}
            Err(_) => panic!(),
        }
    }

and this is because in read_into_buffer, the type of a u8 is coerced into a char:

    fn read_into_buffer(&mut self) -> Result<(), Error> {
        if self.eof {
            return Ok(());
        }
        while let Ok(from_channel) = self.reader.try_recv() {
            match from_channel {
                Ok(PipedChar::Char(c)) => self.buffer.push(c as char),
                Ok(PipedChar::EOF) => self.eof = true,
                // this is just from experience, e.g. "sleep 5" returns the other error which
                // most probably means that there is no stdout stream at all -> send EOF
                // this only happens on Linux, not on OSX
                Err(PipeError::IO(ref err)) => {
                    // For an explanation of why we use `raw_os_error` see:
                    // https://github.com/zhiburt/ptyprocess/commit/df003c8e3ff326f7d17bc723bc7c27c50495bb62
                    self.eof = err.raw_os_error() == Some(5)
                }
            }
        }
        Ok(())
    }

This is done because the type of PipedChar(u8) is different from the element type of buffer: String.

This behaviour is divergent from pexpect. I have 3 solutions to it:

  1. Change the type of PipedChar(u8) to PipedChar(char): If the program sends over half of a unicode char and then stop it would hang the reader
  2. Change the type of buffer to something like Vec<u8> which can't parse unicode, but it feels like this is kicking the problem down the road.
  3. Add an encoder on the receiving end of PipedChar objects to choose between the utf-8 and ascii behaviours (pexpect behaves like this
lypanov commented 11 months ago

Running into this issue now also. Would be lovely to see the MR merged :)

lenianiva commented 4 months ago

Running into this issue now also. Would be lovely to see the MR merged :)

sadly the authors seem to be inactive