Open eserte opened 9 months ago
It also seems that t/13_req.t in https://metacpan.org/release/YTURTLE/Net-Azure-EventHubs-0.09 also started to fail because of this change. See http://matrix.cpantesters.org/?dist=Net-Azure-EventHubs%200.09
Web-Request-0.11 seems to fail with URI >= 5.19, so possibly the same problem. See also https://github.com/doy/web-request/issues/4
This commit looks likely: 9ee8098
The new code looks more correct to me though, IMHO.
rfc3986 does not tell much about the format of query strings. It mentions that it can include key=value pairs, but even does not tell what separator characters (;
or &
or something else?) should be used. Especially it does not tell anything about value-less keys. Is there any other standard which can be followed?
I think the closest we have is https://url.spec.whatwg.org/#application/x-www-form-urlencoded, which seeks to obsolete parts of RFC3986. It speaks only of "null byte sequences", which could be interpreted as either undef
or ''
-- we should probably treat these equivalently when serializing, and when deserializing either could be correct (so it would be incorrect to assert one over the other in a test).
+1 for treating ""
and undef
equivalently. This could mean that the following should print the same:
$ perl5.39.5 -MURI -E '$u=URI->new("http:"); $u->query_form(foo => ""); say $u->query'
foo=
$ perl5.39.5 -MURI -E '$u=URI->new("http:"); $u->query_form(foo => undef); say $u->query'
foo
However I learned now that "foo" should be treated the same as "foo=" when parsing x-www-form-urlencoded content, so there's an additional problem in some parsing code...
For example, CGI.pm may do strange things if the equals sign is missing:
$ echo -n "foo=" | env REQUEST_METHOD=POST CONTENT_TYPE=application/x-www-form-urlencoded CONTENT_LENGTH=4 perl5.39.5 -MCGI -MData::Dumper -E 'say Dumper({CGI->new->Vars})'
$VAR1 = {
'foo' => ''
};
$ echo -n "foo" | env REQUEST_METHOD=POST CONTENT_TYPE=application/x-www-form-urlencoded CONTENT_LENGTH=3 perl5.39.5 -MCGI -MData::Dumper -E 'say Dumper({CGI->new->Vars})'
$VAR1 = {
'keywords' => 'foo'
};
But if there's another parameter, then it looks sane again:
$ echo -n "bar=baz&foo=" | env REQUEST_METHOD=POST CONTENT_TYPE=application/x-www-form-urlencoded CONTENT_LENGTH=12 perl5.39.5 -MCGI -MData::Dumper -E 'say Dumper({CGI->new->Vars})'
$VAR1 = {
'bar' => 'baz',
'foo' => ''
};
$ echo -n "bar=baz&foo" | env REQUEST_METHOD=POST CONTENT_TYPE=application/x-www-form-urlencoded CONTENT_LENGTH=11 perl5.39.5 -MCGI -MData::Dumper -E 'say Dumper({CGI->new->Vars})'
$VAR1 = {
'bar' => 'baz',
'foo' => ''
};
I have been following this conversation a bit, and too be honest, it gets me slightly worried.
Surely, in the old age off HTTP 1.0, and HTML 3.5 and JavaScript 1.2 when I joined the band, those POST request with x-www-form-urlencoded
, it was not possible to make the difference between an empty textbox, or a texttbox that was not filled in, and we only had empty strings. Therefor, it made sense to treat them equally and passing it as foo=
was the right thing to do... as an empty string.
But nowadays, we have api-clients that are talking to api-back-ends and potentially need to be able to send undef
, or JSON's null
, or nil
'values' instead of 'empty string'. As I got accustomed to, is that foo
on itself was meant to be handled as undef
, and foo=
as an empty string.
Whatever the code would do downstream is up to the application developer.
But it does worry, that we might indeed break downstream code. If we do, we might be better off rolling back or prevent breakage... for now. And suggest to see what else is breaking – I think our amazing smoke-testers already do an amazing job – and work with the downstream developers to prepare a fix that will work the moment we say that foo
on itself means undef
Oh... and happy new-year :-)
How we proceed in fixing this?
It should be trivial to update Catalyst-Controller-DBIC-API to fix its tests to account for this change.
As the change in URI affects other dists too I was hoping it is reverted here.
I've changed the test to pass an empty string instead of undef so the same URI as before is generated and tested.
A few data points:
CGI.pm:
foo+bar
parses as { keywords => [ 'foo', 'bar' ] }
foo&bar=baz
parses as { foo => '', bar => 'baz' }
{ foo => undef, bar => 'baz' }
encodes as bar=baz
HTTP::Body (used by Catalyst):
foo+bar
parses as {}
foo&bar=baz
parses as { bar => 'baz' }
URL::Encode:
foo&bar=baz
parses as { foo => undef, bar => "baz" }
WWW::Form::UrlEncoded:
foo&bar=baz
parses as { foo => '', bar => 'baz' }
{ foo => undef, bar => 'baz' }
encodes as foo=&bar=baz
URI 5.18:
foo&bar=baz
parses as { foo => '', bar => 'baz' }
{ foo => undef, bar => 'baz' }
encodes as foo=&bar=baz
URI 5.19+:
foo&bar=baz
parses as { foo => undef, bar => 'baz' }
{ foo => undef, bar => 'baz' }
encodes as foo&bar=baz
Based on the URL spec, decoding a parameter without an equals sign should give a value of an empty string. An "empty byte sequence" is an empty string. It is also used to describe a value decoded from key=
. So the new URI behavior for parsing is definitely wrong.
As for encoding, the spec says "Assert: tuple’s name and tuple’s value are scalar value strings". So encoding anything other than a string is undefined behavior. Based on this, I would say that the new URI behavior for encoding is also wrong.
HTTP::Body also apparently has a bug related to this.
Mojolicious:
foo&bar=baz
parses as { foo => '', bar => 'baz' }
foo=&bar=baz
parses as { foo => '', bar => 'baz' }
{ foo => '', bar => 'baz' }
encodes as foo=&bar=baz
{ foo => undef, bar => 'baz' }
encodes as bar=baz
The problem is described in https://rt.cpan.org/Ticket/Display.html?id=150855 It seems that a change in URI.pm is causing the problem. Before (until URI 5.18):
After (with URI 5.19):
Note the mising equals sign.