rust-lang / libs-team

The home of the library team
Apache License 2.0
116 stars 18 forks source link

`parse_line` method for `Stdin` #207

Open Victor-N-Suadicani opened 1 year ago

Victor-N-Suadicani commented 1 year ago

Proposal

Add a fn parse_line<T: FromStr>(&self) -> io::Result<T> method to std::io::Stdin to allow easier user input to be acquired without requiring deep knowledge of heap allocation, String and such. This would significantly lower the barrier to entry for entirely new Rust programmers.

It's also just a nice shortcut through Stdin::read_line + str::parse

Problem statement

Consider the task of teaching someone Rust as their first programming language. The leap from programs that take no user input to programs that do is very high at the moment. This is a problem because programs that do take user input are vastly more interesting for educational purposes.

Currently, in order to write a simple program that takes user input, you have to use Stdin::read_line. Unfortunately, using this method requires the user to understand String, which in turn requires the user to understand heap allocation. The user also needs to understand mutable references, which preclude some level of familiarity with ownership. This is a big ask for someone being introduced to programming through Rust for the first time.

It becomes an even bigger problem once the user input has been received. In order to transform the String into another type (say, a u32), one likely would want to call str::parse. In order to fully understand this method, one must know about FromStr, which in turn means the user needs to also understand traits. This is a lot of knowledge that a user must know up-front before they can write any program that takes user input! This is a big problem educationally speaking.

Motivation, use-cases

By introducing parse_line, we can severely lower the barrier to entry for programs that take user input by bypassing the need for any String in the midst. Receiving user input can be as simple as:

fn main() {
    let n: i32 = stdin().parse_line().unwrap();
    println!("Got integer: {n}");
}

Explaining the above program is significantly simpler than explaining a similar program using read_line and str::parse.

For example, one could use this to teach a new Rust programmer to write a simple terminal-based TicTacToe game. Doing so currently is much harder as you have to use read_line.

One could question if this motivation is sufficient for inclusion in the Standard Library. I personally think the Standard Library should also function as an "onboarding ramp" to a certain extent. It mostly does this through its documentation, but I think this kind of functionality could also help. Adding this functionality certainly doesn't hurt much - I find it very unlikely that it will be a maintenance burden or a future compatibility risk.

It's also worth mentioning that getting similar functionality as parse_line into the hands of an unexperienced Rust user is very, very difficult without including it in the Standard Library. Doing so would probably involve an external crate - however directing a new user towards such a crate requires them to learn about Cargo and general package management and then we get into a whole different problem.

Aside from the above educational motivation, the API also serves an ergonomic improvement over the current workflow:

fn main() {
    let mut s = String::new();
    std::io::stdin().read_line(&mut s).unwrap();
    let n: i32 = s.parse().unwrap();
    println!("Got integer: {n}");
}

The current approach requires two errors to be handled rather than one - this is an improvement in cases where the error handling for the two possible errors (input error and parse error) are the same. This is often the case in small programs. parse_line also reduces the number of lines and makes input+parsing into a one-liner.

I think including this API would fit well with the Supportive Rust goal as presented here.

Solution sketches

A straightforward implementation of parse_line in terms of read_line and str::parse could look like this:

impl Stdin {
    fn parse_line<T: FromStr>(&self) -> io::Result<T>
    where
        <T as FromStr>::Err: Into<Box<dyn Error + Send + Sync>>,
    {
        let mut s = String::new();
        self.read_line(&mut s)?;
        // Remove the trailing newline. (perhaps needs some platform-specific handling)
        let s = s.trim_end_matches(['\n', '\r']);
        match s.parse() {
            Ok(t) => Ok(t),
            Err(e) => Err(io::Error::new(io::ErrorKind::InvalidInput, e)),
        }
    }
}

This solution does not properly handle panics in the parse method but barring panics, the behavior should be identical.

A more efficient solution that handles panics could look like this:

fn parse_line<T: FromStr>(&self) -> io::Result<T>
where
    <T as FromStr>::Err: Into<Box<dyn Error + Send + Sync>>,
{
    // We need to call consume even in case of panics in the user-provided parse method.
    struct ConsumeGuard<'a>(StdinLock<'a>, Option<usize>);
    impl<'a> Drop for ConsumeGuard<'a> {
        fn drop(&mut self) {
            if let Some(len) = self.1 {
                self.0.consume(len)
            }
        }
    }

    let mut consume_guard = ConsumeGuard(self.lock(), None);
    let buf = consume_guard.0.fill_buf()?;

    let mut slow_path_string;

    // Note that we must search for the newline before parsing as UTF-8,
    // as the buffer may have cut a character in half.
    let line = buf.split(|b| *b == b'\n').next();

    // Fast non-allocating path.
    let str_to_parse = if let Some(line) = line {
        // Setting the len ensures `consume` is called as the guard is dropped.
        // +1 for the newline which was removed by split.
        consume_guard.1 = Some(line.len() + 1);

        std::str::from_utf8(line).map_err(|e| io::Error::new(io::ErrorKind::InvalidData, e))?
    } else {
        // There was not enough data in the buffer already, switching to slower allocating path.
        slow_path_string = String::new();
        consume_guard.0.read_line(&mut slow_path_string)?;
        // -1 to get rid of the newline byte.
        &slow_path_string[0..slow_path_string.len() - 1]
    };

    str_to_parse
        // On Windows, the string may also include a carriage return which we'll need to remove.
        .rsplit_once('\r')
        .map(|(p, _)| p)
        .unwrap_or(str_to_parse)
        .parse()
        .map_err(|e| io::Error::new(io::ErrorKind::InvalidInput, e))
}

This would actually make parse_line more efficient than read_line + parse as it takes advantage of the buffering of stdin.

Links and related work

See also discussion on Zulip

C++'s cin works in a similar generic fashion:

#include <iostream>
int main() {
    int i;
    std::cin >> i;
    std::cout << "Got integer: " << i << std::endl;
    return 0;
}

What happens now?

This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.

scottmcm commented 1 year ago

Reading only a char makes this feel much too restricted to be worth it, to me. For example, the "guessing game" example in the book couldn't use this.

I think something more like https://github.com/rust-lang/rfcs/pull/3196#issuecomment-980479946 would be the way forward here. With that,

let c: char = std::io::inputln();

would still work, but it doesn't immediately throw people off the deep end if they want to read a number, since it could also accept

let guess: i32 = std::io::inputln();
Victor-N-Suadicani commented 1 year ago

For example, the "guessing game" example in the book couldn't use this.

Tbh I think the guessing game in the book is quite a difficult example to start with. The book gets away with it because it assumes prior programming experience, but for someone without any prior programming experience, the guessing game is a little too much I think. At the very least, it requires the reader to take a lot of stuff for granted without fully understanding them.

Even if it is "just" a char, that does enable quite some things. You could do TicTacToe, Hangman or even chess. Just because you can't do the guessing game doesn't make it not useful :)

As for https://github.com/rust-lang/rfcs/pull/3196, that looks interesting! However, there's multiple problems:

  1. I don't think it should panic on errors, so you should probably at least get a Result<char>.
  2. It seems like the RFC isn't headed towards a generic solution as you present, but more likely just receiving a Result<String>. That would make the motivation for this valid still.
  3. The generic interface would make the usage of the function less obvious and more complicated, and this is at odds with the purpose of the read_line_as_char function.

Some of that is already said in this comment https://github.com/rust-lang/rfcs/pull/3196#issuecomment-980510819

With all that said, there's no reason you couldn't have both read_line_as_char and the inputln. I don't think it would hurt to have both.

the8472 commented 1 year ago

Currently, in order to write a simple program that takes user input, you have to use Stdin::read_line. Unfortunately, using this method requires the user to understand String, which in turn requires the user to understand heap allocation. The user also needs to understand mutable references, which preclude some level of familiarity with ownership. This is a big ask for someone being introduced to programming through Rust for the first time.

This assumes that one tries to teach bottom-up understanding from the start. To get going it can be fine to begin with top-down explanations ("this line reads text from the console input", "this line transforms the text into a number") and then refine those concepts in later lessons as needed.

Victor-N-Suadicani commented 1 year ago

This assumes that one tries to teach bottom-up understanding from the start.

Definitely - and some people learn best that way. Top-down explanations work for some, but it often involves a "leap of faith" from the learner to just accept what is being presented without understanding. In my opinion, top-down suffers from a feeling of being overwhelmed with concepts that you don't understand and losing overview of what you do understand.

In general, people have very different ways of learning. I think it'd be best if we cater to more than just the "top-down" approach. It works for some, not for others.

Victor-N-Suadicani commented 1 year ago

I think something more like rust-lang/rfcs#3196 (comment) would be the way forward here.

After some discussion on Zulip, I agreed that the generic version that doesn't only work with chars is more useful and I have edited the ACP.

This does make the proposal very similar to rust-lang/rfcs#3196, except it includes a parsing step.