Open yoshuawuyts opened 4 years ago
Talked with @yoshuawuyts, and we believe the point of integration is overwriting to the read_to_string
method for AsynRead
on Request
and Response
. Instead of returning an error if the underlying bytes are not utf-8, we'll make a best attempt to convert to utf-8 based on the content-type header
I think this should be a property of the Body
now right? like res.into_body().read_to_string(&mut s)
i'm tinkering with making a encoding_rs_io
but for AsyncRead, so you can do res.into_body().into_utf8_reader()
to stream utf-8 transcoded bytes, or into_utf8_reader().read_to_string()
to collect it all into a big string.
on the http-types end, the Body
needs to be made aware of the actual mime type, since right now it appears to always be octet-stream
?
@goto-bus-stop What I'm wondering about is: are there any cases where we wouldn't want to decode based on the encoding type?
I suspect the answer here might be "probably not", which is: maybe we should handle this out of the box? So the proposed API would change:
let req = Request::new(Url::new("https://example.com"));
let reader = req.into_utf8_reader();
let mut s = String::new();
reader.read_to_string(&mut s).await?;
let req = Request::new(Url::new("https://example.com"));
let mut s = String::new();
req.read_to_string(&mut s).await?;
The biggest question remaining then seems: do we need an escape hatch? I'm not sure tbh. Perhaps there could be a flag on the Body
that disables the decoder?
let req = Request::new(Url::new("https://example.com"));
let mut body = req.into_body();
body.decoder(false);
let mut s = String::new();
body.read_to_string(&mut s).await?;
on the http-types end, the Body needs to be made aware of the actual mime type, since right now it appears to always be octet-stream?
When encoding a stream we indeed don't assume anything, and set octet-stream. But for strings we correctly set text/plain; utf-8
, and JSON similarly application/json; utf-8
.
When decoding a stream we don't do anything with text encodings. Most of my previous comment was about decoding only. Is there anything about encoding you think we should be doing?
for encoding we probably don't need non-utf8 support (certainly not initially). it would be kinda strange in 2020 to write a server in a cutting-edge language and framework if you need to service clients that don't support utf8 (do they exist?).
i was primarily wondering about decoding as well, since the Body does not know what its encoding is when it receives a response (assuming http-types use on the client-side). I think the Response
type could apply the mime type to the Body from inside the set_header
function but i'm not totally sure if that's correct.
I don't think we need a custom escape hatch since you can already get the raw bytes. If you really want to use utf8 for some reason even if the server explicitly advertises a different encoding, you can do String::from_utf8(body.read_to_end())
, i guess.
I think the Response type could apply the mime type to the Body from inside the set_header function but i'm not totally sure if that's correct.
I think we could indeed use set_header
for this. There's a slight nuance because Request
is shared between both client and server, and we only want to use encodings on the client. But that's perhaps a matter of exposing a flag we can enable/disable on one side, but not the other.
Similar to https://github.com/http-rs/surf/issues/101 and https://github.com/http-rs/surf/pull/108 we should make string parsing and encoding aware of the
Content-Type
header.cc/ @goto-bus-stop; pinging you here since you contributed the patch to Surf; we're planning to move Tide and Surf both over to
http-types
, so figured it'd make sense to include the work you've done here.