rust-lang / rust-by-example

Learn Rust with examples (Live code editor included)
https://doc.rust-lang.org/stable/rust-by-example/
Apache License 2.0
6.98k stars 1.34k forks source link

Add a simpler version of read_lines #1480

Closed samoylovfp closed 7 months ago

samoylovfp commented 2 years ago

Hello! First of all thanks for the amazing resource!

The read_lines example features an optimized way of reading lines from a file with a BufReader.

The code is efficient but is pretty involved and makes a newcomer stumble, forcing them to read about Results, BufReaders, AsRef and generics with a where clause.

My new-to-rust friend ran into this when trying to solve challenges from Advent of Code.

Most challenges from AoC require reading a sequence of lines from an input file and I feel it would be good to also have a less efficient example that exposes less new concepts to a newcomer, because this is the first thing you run into when trying to solve an AoC challenge.

Example

fn read_lines(filename: &str) -> Vec<String> {
    read_to_string(filename) 
        .unwrap()  // panic on possible file-reading errors
        .lines()  // split the string into an iterator of string slices
        .map(String::from)  // make each slice into a string
        .collect()  // gather them together into a vector
}

or

fn read_lines(filename: &str) -> Vec<String> {
    let mut result = Vec::new();

    for line in read_to_string(filename).unwrap().lines() {
        result.push(line.to_string())
    }

    result
}

I am still not happy about having to mention &str to String conversion but I feel like even though these examples are longer they apply less cognitive pressure to someone not familiar with rust

cafce25 commented 1 year ago

While a good step in the direction of a simpler version FritteX14's solution doesn't solve all the concerns in this issue, it is still using BufReader. I'd also argue it's barely less efficient because it still avoids reading the contents of the file into memory. Further because it explicitly states it's more efficient it confuses some new users on Stack Overflow possible more. I'd say labeling the original version as "efficient" is kind of a mislabel here.

vincentdephily commented 1 year ago

The current examples are still confusing for newcomers, especially because both version use io::BufReader::new(file).lines() and have essentially the same efficiency. The comment This process is more efficient than creating a String in memory especially working with larger files seems to no longer apply to the current examples. I think some changes would help:

dvdsk commented 1 year ago

The comment This process is more efficient than creating a String in memory especially working with larger files seems to no longer apply to the current examples.

I think you are basing you feedback on the rust-by-example book, if you look at the source you see that comment still applies.

Use the same idiomatic error handling in all examples, as converting from .unwrap()-style code is not the subject of these example and is just adding to confusion.

Good point, a look around this repo show most file examples use match to handle errors for example:

let mut file = match File::create(&path) {
    Err(why) => panic!("couldn't create {}: {}", display, why),
    Ok(file) => file,
};

It might be better to match that style. However rust by example is not only for beginners replacing unwrap with match makes it harder to copy the code and build on it. Using ? would be even better for copying and make it even less understandable for newcomers. Maybe keep unwrap in the more efficient case and add a note with a link to the explanation in the book?

Keep the second mostly example as-is, with comment-level explanations about using AsRef<Path>, BuffReader, and returning an iterator.

Most of my non library code does not use AsRef, as its usually not needed. It makes the example more complicated and might push readers to overuse generics. The other file examples do not make use of it.

Add a third example using read_until() and reusing the same line buffer, maybe even a Vec<u8> buffer that only gets promoted to &str based on some criteria.

IMO if we add a third example centered around performance it should clearly note that this degrades readability and is usually the wrong approach. We might also want to proof it actually improves performance using a benchmark. At which point it get to complicated for rust-by-example.

It would fit the rust performance book. A link to that might be nice here.

vincentdephily commented 1 year ago

I think you are basing you feedback on the rust-by-example book, if you look at the source you see that comment still applies.

Ah indeed I was looking at doc.rust-lang.org, the git version makes a lot more sense. Is it a matter of waiting for the next deployment on d.r-l.o ?

It might be better to match that style. However rust by example is not only for beginners replacing unwrap with match makes it harder to copy the code and build on it. Using ? would be even better for copying and make it even less understandable for newcomers. Maybe keep unwrap in the more efficient case and add a note with a link to the explanation in the book?

Not sure what is the best style to adopt, there are good arguments for unwrap, match and ?. I'd lean toward ? to encourage best practices, but the main thing it to remain consistent, at least within this example and hopefully throughout the book.

Most of my non library code does not use AsRef, as its usually not needed. ... IMO if we add a third example centered around performance it should clearly note that this degrades readability and is usually the wrong approach.

I think both AsRef and read_until make sense in a third example with a foreword that they might be overkill and that the second example is often good enough. But they're both pretty classic tweaks for flexible API (AsRef) and speedier parsing (read_until).

Delaying the from_utf8() call is indeed a lot of complexity for YMMV gains, so we can leave it out.

marioidival commented 7 months ago

Hi, I'm closing this issue just to clean up the items that already exist. If you think this issue makes sense to stay open, please create a new issue with updated information and mention this one.

thanks!