Open marcusklaas opened 9 years ago
It sounds like you want something that implements not std::io::Read
(which is a stream of bytes) but another trait for a Unicode stream. But as discussed in this RFC: https://github.com/rust-lang/rfcs/pull/57, doing it for reading is tricky. The bytes one takes a &mut [u8]
argument, writes to it, and returns the number of written bytes. But doing that with &mut str
might require some zeroing, or something. The contents of str
must be well-formed UTF-8.
I’m experimenting with things that could help here. I’ll post again where there’s something more fully formed to show.
Sorry for my vague description. I meant some kind of adapter between a stream of bytes in for examples Windows-1252 and a stream of bytes in utf-8. The unicode stream would be very nice, but there's a lot of code that already works with std::io::Read
.
That sounds like it could be built on top of "raw" decoders.
… probably with an impl of encoding::types::StringWriter
for &mut [u8]
, to be used with the argument to Read::read
.
Any progress? Anything changed since last time that would make it easier?
I just came across the same myself. Would this be something that is in the scope of the crate?
I have to write these impls for a project of mine and would also like to hear whether @lifthrasiir thinks they might be in scope for this crate.
I've also started a conversation on the encoding_rs
crate: https://github.com/hsivonen/encoding_rs/issues/8
To cross pollinate a bit here from the encoding_rs
crate... @SimonSapin and I worked on our own versions of Read
trait implementations (except @SimonSapin did quite a bit more!). @SimonSapin's work is in this PR: https://github.com/hsivonen/encoding_rs/pull/9 My work is here: https://github.com/BurntSushi/ripgrep/blob/75f1855a91ca00b5d0e62740595b1b91bc5142a2/src/decoder.rs
The big idea here is that implementing these traits is quite tricky, and neither of our implementations is fully correct. Mine gets most of the way there, but doesn't support single-byte-reads, which means the bytes
adapter method doesn't work at all. It's possible to make this work, but requires a bit more book-keeping.
I wonder if the traits are misdesigned for non utf-8 usage. It's weird that they work with both strings and bytes.
In my case, I very much wanted to ever avoid materializing a &str
and the costs associated with it. So operating on &[u8]
is perfect.
It would be convenient to have an object that implements
Read
, so one could for example easily and efficiently read from a file in an encoding other than utf-8.