libwww-perl / URI

The Perl URI module
https://metacpan.org/pod/URI
Other
41 stars 48 forks source link

URI 5.19 breaks Catalyst-Controller-DBIC-API #133

Open eserte opened 9 months ago

eserte commented 9 months ago

The problem is described in https://rt.cpan.org/Ticket/Display.html?id=150855 It seems that a change in URI.pm is causing the problem. Before (until URI 5.18):

$ perl -MHTTP::Request::Common=POST -e 'warn((POST "http://localhost/something", { foo => undef })->dump)' 
POST http://localhost/something
Content-Length: 4
Content-Type: application/x-www-form-urlencoded

foo=

After (with URI 5.19):

$ perl -MHTTP::Request::Common=POST -e 'warn((POST "http://localhost/something", { foo => undef })->dump)' 
POST http://localhost/something
Content-Length: 3
Content-Type: application/x-www-form-urlencoded

foo

Note the mising equals sign.

eserte commented 9 months ago

It also seems that t/13_req.t in https://metacpan.org/release/YTURTLE/Net-Azure-EventHubs-0.09 also started to fail because of this change. See http://matrix.cpantesters.org/?dist=Net-Azure-EventHubs%200.09

eserte commented 9 months ago

Web-Request-0.11 seems to fail with URI >= 5.19, so possibly the same problem. See also https://github.com/doy/web-request/issues/4

karenetheridge commented 9 months ago

This commit looks likely: 9ee8098

The new code looks more correct to me though, IMHO.

eserte commented 9 months ago

rfc3986 does not tell much about the format of query strings. It mentions that it can include key=value pairs, but even does not tell what separator characters (; or & or something else?) should be used. Especially it does not tell anything about value-less keys. Is there any other standard which can be followed?

karenetheridge commented 9 months ago

I think the closest we have is https://url.spec.whatwg.org/#application/x-www-form-urlencoded, which seeks to obsolete parts of RFC3986. It speaks only of "null byte sequences", which could be interpreted as either undef or '' -- we should probably treat these equivalently when serializing, and when deserializing either could be correct (so it would be incorrect to assert one over the other in a test).

eserte commented 9 months ago

+1 for treating "" and undef equivalently. This could mean that the following should print the same:

$ perl5.39.5 -MURI -E '$u=URI->new("http:"); $u->query_form(foo => ""); say $u->query'
foo=
$ perl5.39.5 -MURI -E '$u=URI->new("http:"); $u->query_form(foo => undef); say $u->query'
foo

However I learned now that "foo" should be treated the same as "foo=" when parsing x-www-form-urlencoded content, so there's an additional problem in some parsing code...

eserte commented 9 months ago

For example, CGI.pm may do strange things if the equals sign is missing:

$ echo -n "foo=" | env REQUEST_METHOD=POST CONTENT_TYPE=application/x-www-form-urlencoded CONTENT_LENGTH=4 perl5.39.5 -MCGI -MData::Dumper -E 'say Dumper({CGI->new->Vars})'
$VAR1 = {
          'foo' => ''
        };
$ echo -n "foo" | env REQUEST_METHOD=POST CONTENT_TYPE=application/x-www-form-urlencoded CONTENT_LENGTH=3 perl5.39.5 -MCGI -MData::Dumper -E 'say Dumper({CGI->new->Vars})'
$VAR1 = {
          'keywords' => 'foo'
        };

But if there's another parameter, then it looks sane again:

$ echo -n "bar=baz&foo=" | env REQUEST_METHOD=POST CONTENT_TYPE=application/x-www-form-urlencoded CONTENT_LENGTH=12 perl5.39.5 -MCGI -MData::Dumper -E 'say Dumper({CGI->new->Vars})'
$VAR1 = {
          'bar' => 'baz',
          'foo' => ''
        };
$ echo -n "bar=baz&foo" | env REQUEST_METHOD=POST CONTENT_TYPE=application/x-www-form-urlencoded CONTENT_LENGTH=11 perl5.39.5 -MCGI -MData::Dumper -E 'say Dumper({CGI->new->Vars})'
$VAR1 = {
          'bar' => 'baz',
          'foo' => ''
        };
vanHoesel commented 9 months ago

I have been following this conversation a bit, and too be honest, it gets me slightly worried.

Surely, in the old age off HTTP 1.0, and HTML 3.5 and JavaScript 1.2 when I joined the band, those POST request with x-www-form-urlencoded, it was not possible to make the difference between an empty textbox, or a texttbox that was not filled in, and we only had empty strings. Therefor, it made sense to treat them equally and passing it as foo= was the right thing to do... as an empty string.

But nowadays, we have api-clients that are talking to api-back-ends and potentially need to be able to send undef, or JSON's null, or nil 'values' instead of 'empty string'. As I got accustomed to, is that foo on itself was meant to be handled as undef, and foo= as an empty string.

Whatever the code would do downstream is up to the application developer.

But it does worry, that we might indeed break downstream code. If we do, we might be better off rolling back or prevent breakage... for now. And suggest to see what else is breaking – I think our amazing smoke-testers already do an amazing job – and work with the downstream developers to prepare a fix that will work the moment we say that foo on itself means undef

Oh... and happy new-year :-)

abraxxa commented 7 months ago

How we proceed in fixing this?

haarg commented 7 months ago

It should be trivial to update Catalyst-Controller-DBIC-API to fix its tests to account for this change.

abraxxa commented 7 months ago

As the change in URI affects other dists too I was hoping it is reverted here.

abraxxa commented 7 months ago

I've changed the test to pass an empty string instead of undef so the same URI as before is generated and tested.

haarg commented 7 months ago

A few data points:

CGI.pm:

HTTP::Body (used by Catalyst):

URL::Encode:

WWW::Form::UrlEncoded:

URI 5.18:

URI 5.19+:

Based on the URL spec, decoding a parameter without an equals sign should give a value of an empty string. An "empty byte sequence" is an empty string. It is also used to describe a value decoded from key=. So the new URI behavior for parsing is definitely wrong.

As for encoding, the spec says "Assert: tuple’s name and tuple’s value are scalar value strings". So encoding anything other than a string is undefined behavior. Based on this, I would say that the new URI behavior for encoding is also wrong.

HTTP::Body also apparently has a bug related to this.

karenetheridge commented 7 months ago

Mojolicious: