Closed jjatria closed 3 years ago
fwiw Perl libraries seem to match the curl behavior:
$ perl -MHTTP::Tiny -MData::Dumper -E 'say Dumper(HTTP::Tiny->new->get("http://icanhazip.com"))'
$VAR1 = {
'protocol' => 'HTTP/1.1',
'headers' => {
'x-rtfm' => 'Learn about this site at http://bit.ly/icanhazip-faq and do not abuse the service.',
'content-length' => '13',
'date' => 'Mon, 02 Nov 2020 21:37:47 GMT',
'x-duck' => '🦆',
'connection' => 'close',
'server' => 'nginx',
'access-control-allow-origin' => '*',
'content-type' => 'text/plain; charset=UTF-8',
'access-control-allow-methods' => 'GET',
'x-donation' => 'This site is expensive to run. You can donate BTC to 3LSp89k9qnMJBpV7AUNF3M2Eo1vatpkYpm',
'x-node' => 'icanhazip-iad-1'
},
'success' => 1,
'url' => 'http://icanhazip.com',
'status' => '200',
'content' => '204.60.6.160
',
'reason' => 'OK'
};
$ perl -MLWP::UserAgent -MData::Dumper -E 'say Dumper(LWP::UserAgent->new->get("http://icanhazip.com"))'
$VAR1 = bless( {
'_request' => bless( {
'_headers' => bless( {
'user-agent' => 'libwww-perl/6.05'
}, 'HTTP::Headers' ),
'_method' => 'GET',
'_content' => '',
'_uri' => bless( do{\(my $o = 'http://icanhazip.com')}, 'URI::http' ),
'_uri_canonical' => bless( do{\(my $o = 'http://icanhazip.com/')}, 'URI::http' )
}, 'HTTP::Request' ),
'_msg' => 'OK',
'_rc' => '200',
'_headers' => bless( {
'server' => 'nginx',
'content-type' => 'text/plain; charset=UTF-8',
'date' => 'Mon, 02 Nov 2020 21:37:01 GMT',
'access-control-allow-methods' => 'GET',
'client-response-num' => 1,
'client-date' => 'Mon, 02 Nov 2020 21:37:01 GMT',
'x-rtfm' => 'Learn about this site at http://bit.ly/icanhazip-faq and do not abuse the service.',
'access-control-allow-origin' => '*',
'connection' => 'close',
'::std_case' => {
'x-node' => 'X-Node',
'client-peer' => 'Client-Peer',
'x-duck' => 'X-Duck',
'x-donation' => 'X-Donation',
'access-control-allow-methods' => 'Access-Control-Allow-Methods',
'client-date' => 'Client-Date',
'client-response-num' => 'Client-Response-Num',
'x-rtfm' => 'X-Rtfm',
'access-control-allow-origin' => 'Access-Control-Allow-Origin'
},
'content-length' => '13',
'x-node' => 'icanhazip-iad-1',
'client-peer' => '136.144.56.255:80',
'x-donation' => 'This site is expensive to run. You can donate BTC to 3LSp89k9qnMJBpV7AUNF3M2Eo1vatpkYpm',
'x-duck' => '🦆'
}, 'HTTP::Headers' ),
'_protocol' => 'HTTP/1.1',
'_content' => '204.60.6.160
'
}, 'HTTP::Response' );
I also tried parsing x-duck: 🦆
with https://github.com/ugexe/Perl6-Grammar--HTTP but it caused it to not parse any headers at all. Then I tried Net::HTTP
and got the same results as the other Raku libraries:
$ raku -MNet::HTTP::GET -e 'say Net::HTTP::GET("http://icanhazip.com").header<X-Duck>.raku'
$["ð\x[9F]¦\x[86]"]
Which makes me wonder if all these Raku libraries are doing the best thing. Regardless, both of these behaviors are better than the existing behavior this replaces.
Which makes me wonder if all these Raku libraries are doing the best thing.
Aside from decoding as ASCII or ISO-8859-1, I don't think parsing header values as UTF-8 is justified, at least based on the spec. What would you say would be "the best thing", @ugexe?
Before this change, requests to servers that send UTF-8 data (or any non-ASCII data) as a header value (such as the
x-duck
that http://icanhazip.com sets to 🦆) would be unparseable, and die witherror.
Although servers shouldn't be sending ducks in HTTP/1.1 headers, clients also shouldn't die when trying to parse those responses.
On this, RFC 7230 § 3.2.4 states:
It is not clear what the spec suggests clients should do regarding this 'opaque data', but the solution in this patch means that responses can be parsed, and users that do want to recover whatever opaque data was sent in a header can do so by re-encoding the header value and decoding in whatever way is appropriate.
The change in this case is a little broader than ideal, since it interprets the entire header chunk (including the status line and all headers) as ISO-8859-1. Mitigations for this could be put in place if needed, but some more involved refactoring would likely be required.
For what it's worth, this solution will make HTTP::UserAgent produce similar results to Cro::HTTP::Client, and HTTP::Tiny. Curl-based Raku libraries will, on the other hand, decode the duck as UTF-8 which, although funnier, is arguably incorrect.
Fixes #236