Closed WebeWizard closed 9 years ago
+1.
I agree. I think that this mod should be applied at least to read_line()
which just reads a single line (iterators can be a different story eventually — maybe we want to keep it for the line iterator so that the total of its content is exactly equal to the originating input? I don't know).
Current behavior forces to trim EOL chars when for example casting stdin input to a numbered type.
We probably need a read_until()
function (that's what read_line()
is made of, and .lines()
is itself made of the former) variant that pushes everything but the byte character you want to stop at.
Hope that this proposal makes sense.
cc @alexcrichton
My initial thoughts in designing the read_until
function this was were that I did not want to lose data here and there. Without returning the delimiter, you have no method of knowing whether there actually was a delimiter or not (which may be useful sometimes)
That being said, this is a convenience method, so correctness/completeness may not be paramount. I think I based this off Go's interface, but I would also be curious about what other languages do as well.
@alexcrichton We could just return an Option with None
if the delimiter wasn't found.
steveklabnik/rust_for_rubyists#48 is related; the current read_line must also deal with Windows line endings:
let num = from_str::<int>(input.trim_right_chars(& &['\n', '\r']));
Since @alexcrichton seemed interested in what other languages do; in the Python world this would typically be handled by reading the entire file into a string and calling splitlines. Which takes advantage of an underlying Python TextIO feature of "universal newlines" in which the File IO layer hides the different types of newlines from the user; all instances of '\n', '\r' and '\r\n' are returned as '\n'. This was introduced in PEP3116.
This would be similar to using Rust's AnyLineIterator.
Of course reading the whole file in as a string isn't always a great idea. Generally I'd guess (as with @WebeWizard here) you'd be dealing with CSV's. In Python's case this is handled by a separate library. I did see one rust CSV library which looked to be struggling slightly with newlines itself.
I wonder if Rust needs higher level File IO libraries (such as csv) or if extending the BufferedReader as suggested here is a good idea for the meantime? It's also worth considering if introducing something like "universal newlines" could be easier now than later.
@alexcrichton The other point is efficiency. If you need an owned pointer you have to take a slice without the delimiter and then covert that to owned. In Go you can just slice it and you are done.
Why do you have to convert to owned?
@sfackler So I can send it to a channel for example
Ruby also keeps the newline characters intact:
"abc\ndef".lines # => ["abc\n", "def"]
STDIN.readline # => "the text you enter\n"
But in Ruby you can easily chop them off using String#chomp
:
"abc\n".chomp # => "abc"
"abc\r\n".chomp # => "abc"
"abc".chomp # => "abc"
"abc\n\n".chomp # => "abc\n" -- Only one newline is chomped off!
I think that we should introduce a convenience function like Ruby's String#chomp
.
But I would not do this directly in the reader.
The trim
, trim_right
, and trim_left
functions already exist, but return slices.
We could make a variant of trim_right
that did an in-place modification of an owned string, but I'd shy away from doing the same for trim
and trim_left
since that'll be a pretty expensive operation compared to slicing.
@sfackler: Chopping off the newline character is so common that there should be a utility function for this purpose. I expect chomp
to also return a slice.
@mneumann right, that's what the trim*
family does: http://static.rust-lang.org/doc/master/std/str/trait.StrSlice.html#tymethod.trim
@sfackler: Yes, but they also trim whitespaces, unless you want to write input.trim_right_chars(& &['\n', '\r']))
, which is very verbose and would chop off as many newline characters as there are, and in regardless which order ("\r\n"
or "\n\r"
). Of course the latter cannot happen when using read_line
, but still I prefer a specialized "strip the newline off" method.
Ah, gotcha
But I would neither add any new method to BufferedReader
nor to StrOwned
, only one specialized funtion to StrSlice
, whatever it's name may be (something like trim_line_end()
, but of course I'd prefer chomp()
as it's short and I know it from Ruby :))
We're now using the RFC process to deal with standard library changes, and indeed, the IO RFC is currently active. If anyone still cares about this, that's the right place to get involved.
I keep running into situations where I need to read (BufferedReader) from a list of comma separated values or newline separated values and then storing the result. Should I have to trim off the delimiter every time? I feel like this is a common enough use case to be included in libstd.