rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.8k stars 12.66k forks source link

Feature Request: BufferedReader function for reading without returning delimeter #11404

Closed WebeWizard closed 9 years ago

WebeWizard commented 10 years ago

I keep running into situations where I need to read (BufferedReader) from a list of comma separated values or newline separated values and then storing the result. Should I have to trim off the delimiter every time? I feel like this is a common enough use case to be included in libstd.

//reading until newline and trimming off the newline chars.
//probably a better way to do this.
let tempvalue = lineIter.next().unwrap().to_str();
let length = tempvalue.len();
let value = tempvalue.as_slice().slice_to( length - 2 );
bachm commented 10 years ago

+1.

adrientetar commented 10 years ago

I agree. I think that this mod should be applied at least to read_line() which just reads a single line (iterators can be a different story eventually — maybe we want to keep it for the line iterator so that the total of its content is exactly equal to the originating input? I don't know).

Current behavior forces to trim EOL chars when for example casting stdin input to a numbered type.

We probably need a read_until() function (that's what read_line() is made of, and .lines() is itself made of the former) variant that pushes everything but the byte character you want to stop at. Hope that this proposal makes sense.

cc @alexcrichton

alexcrichton commented 10 years ago

My initial thoughts in designing the read_until function this was were that I did not want to lose data here and there. Without returning the delimiter, you have no method of knowing whether there actually was a delimiter or not (which may be useful sometimes)

That being said, this is a convenience method, so correctness/completeness may not be paramount. I think I based this off Go's interface, but I would also be curious about what other languages do as well.

adrientetar commented 10 years ago

@alexcrichton We could just return an Option with None if the delimiter wasn't found.

steveklabnik/rust_for_rubyists#48 is related; the current read_line must also deal with Windows line endings:

let num = from_str::<int>(input.trim_right_chars(& &['\n', '\r']));

davbo commented 10 years ago

Since @alexcrichton seemed interested in what other languages do; in the Python world this would typically be handled by reading the entire file into a string and calling splitlines. Which takes advantage of an underlying Python TextIO feature of "universal newlines" in which the File IO layer hides the different types of newlines from the user; all instances of '\n', '\r' and '\r\n' are returned as '\n'. This was introduced in PEP3116.

This would be similar to using Rust's AnyLineIterator.

Of course reading the whole file in as a string isn't always a great idea. Generally I'd guess (as with @WebeWizard here) you'd be dealing with CSV's. In Python's case this is handled by a separate library. I did see one rust CSV library which looked to be struggling slightly with newlines itself.

I wonder if Rust needs higher level File IO libraries (such as csv) or if extending the BufferedReader as suggested here is a good idea for the meantime? It's also worth considering if introducing something like "universal newlines" could be easier now than later.

arjantop commented 10 years ago

@alexcrichton The other point is efficiency. If you need an owned pointer you have to take a slice without the delimiter and then covert that to owned. In Go you can just slice it and you are done.

sfackler commented 10 years ago

Why do you have to convert to owned?

arjantop commented 10 years ago

@sfackler So I can send it to a channel for example

mneumann commented 10 years ago

Ruby also keeps the newline characters intact:

"abc\ndef".lines # => ["abc\n", "def"]
STDIN.readline # => "the text you enter\n"

But in Ruby you can easily chop them off using String#chomp:

"abc\n".chomp # => "abc"
"abc\r\n".chomp # => "abc"
"abc".chomp # => "abc"
"abc\n\n".chomp # => "abc\n" -- Only one newline is chomped off!

I think that we should introduce a convenience function like Ruby's String#chomp.

mneumann commented 10 years ago

But I would not do this directly in the reader.

sfackler commented 10 years ago

The trim, trim_right, and trim_left functions already exist, but return slices.

sfackler commented 10 years ago

We could make a variant of trim_right that did an in-place modification of an owned string, but I'd shy away from doing the same for trim and trim_left since that'll be a pretty expensive operation compared to slicing.

mneumann commented 10 years ago

@sfackler: Chopping off the newline character is so common that there should be a utility function for this purpose. I expect chomp to also return a slice.

sfackler commented 10 years ago

@mneumann right, that's what the trim* family does: http://static.rust-lang.org/doc/master/std/str/trait.StrSlice.html#tymethod.trim

mneumann commented 10 years ago

@sfackler: Yes, but they also trim whitespaces, unless you want to write input.trim_right_chars(& &['\n', '\r'])), which is very verbose and would chop off as many newline characters as there are, and in regardless which order ("\r\n" or "\n\r"). Of course the latter cannot happen when using read_line, but still I prefer a specialized "strip the newline off" method.

sfackler commented 10 years ago

Ah, gotcha

mneumann commented 10 years ago

But I would neither add any new method to BufferedReader nor to StrOwned, only one specialized funtion to StrSlice, whatever it's name may be (something like trim_line_end(), but of course I'd prefer chomp() as it's short and I know it from Ruby :))

steveklabnik commented 9 years ago

We're now using the RFC process to deal with standard library changes, and indeed, the IO RFC is currently active. If anyone still cares about this, that's the right place to get involved.