seanmonstar / httparse

A push parser for the HTTP 1.x protocol in Rust.
https://docs.rs/httparse
Apache License 2.0
569 stars 110 forks source link

Allow obs-text characters in response status reason phrase #84

Closed acfoltzer closed 3 years ago

acfoltzer commented 3 years ago

In particular, this avoids parsing failures when servers return accented ISO 8859-1 characters.

The existing comment noting the non-compliance had two suggestions for future improvement: return a &[u8], or add specialized helpers such as display(). I decided to go with the former since the reason phrase appears largely unused in downstream dependencies like hyper.

This is unfortunately a breaking change due to the new type, but it's a very important change for interoperability with traffic in the wild.

seanmonstar commented 3 years ago

Do you know of specific websites that do this? (Maybe you can't share, that's fine...)

I'm bummed about the breaking change... I've only thought about this for a minute, but another option is to add some reason_bytes: Option<&[u8]> field...

acfoltzer commented 3 years ago

Do you know of specific websites that do this? (Maybe you can't share, that's fine...)

Yes, and maybe soon but not now :crossed_fingers:

I'm bummed about the breaking change...

Yeah, agreed. The only way I've thought up to avoid the breaking change is to return a static "" if non-utf8 bytes are found, after still consuming the reason phrase from the bytes buffer.

another option is to add some reason_bytes: Option<&[u8]> field...

This would also be breaking because all the fields of Response are pub :disappointed:

seanmonstar commented 3 years ago

This would also be breaking because all the fields of Response are pub

Eek, you're right! Ugh, I wrote this crate so long ago, before I realized API traps were a thing...

return a static "" if non-utf8 bytes are found, after still consuming the reason phrase from the bytes buffer.

This doesn't sound so bad, actually... Sure, ideally we'd just have them be opaque bytes, but, but! Maybe the bytes aren't really that useful anyways, and so we can just skip them...?

acfoltzer commented 3 years ago

This doesn't sound so bad, actually... Sure, ideally we'd just have them be opaque bytes, but, but! Maybe the bytes aren't really that useful anyways, and so we can just skip them...?

That'd be fine with us, after all we're only using this crate via hyper which doesn't use the reason phrase. Shall I update the PR with the empty string behavior?

seanmonstar commented 3 years ago

Yea, that sounds good to me, thanks!

acfoltzer commented 3 years ago

Pushed the new behavior in 1ca756c