whatwg / urlpattern

URL Pattern Standard
https://urlpattern.spec.whatwg.org/
Other
160 stars 22 forks source link

Incorrect handling of base URL in URLPatternInit processing #202

Closed rubycon closed 6 months ago

rubycon commented 10 months ago

What is the issue with the URL Pattern Standard?

A correct implementation of the current spec fail with this kind of simple input:

new URLPattern({ pathname: '/books/:id', baseURL: 'https://example.com' });

The issue comes in part from the use of URL spec's internal parsers. These parsers manipulate URL records with some fields (host, port, query, fragment) that default to null when empty.

When those empty fields are then used to fill the default url pattern (step 11-7 to 11-10 of process a URLPatternInit), the processing a base URL string assertion fail (Assert: input is not null.)

In the special case of the baseURL port, URL record use unsigned integers and not strings, so the assertion pass but the code fail further down in escape a pattern string (1. Assert: input is an ASCII string.).

As per Web IDL, implicit string conversion would yield "null" for those empty field (as the URL record struct is not marked [LegacyNullToEmptyString]) which is not what we want.

The spec should check if a baseURL field is not null before using it to fill the url pattern. As for the port it should be converted to a string but is the processing of the stringified port really necessary as we are certain its only ASCII characters and there's nothing to escape ?).

jeremyroman commented 10 months ago

I don't think WebIDL conversion is applicable here, but nonetheless I agree that we need to replace null with an empty string in a few places here.

rubycon commented 10 months ago

@jeremyroman There's somewhat a related bug when creating a pattern parser with some of the encoding callbacks.

These callbacks also rely on URL spec's basic parser and do not check for null value with the exception of the port canonicalizing callback but as with the base URL the integer port should also be serialized or it fail the first escape a regexp string assertion.

As I understand these callbacks should always return a string, as all patterns are string.

jeremyroman commented 10 months ago

Acknowledged on the port needing to be serialized there. The others seem fine to me, as they initialize the URL record's relevant field into a non-null state and then proceed with parsing from a state that I think will never set them back to null. Can you be more specific?

rubycon commented 10 months ago

@jeremyroman Yes, it seems fine for the other components as null is always replaced by "*" during the component compilation.

But should processedInit fields be set to null in the first place when they must be USVString . If it's OK to do so for internal structs that are not exposed through the API I think it shouldn't be an issue.

This issue is also discussed here https://github.com/whatwg/urlpattern/issues/204#issuecomment-1891552179

jeremyroman commented 10 months ago

Well, it's not that they must be USVString, it's that they must be USVString if they are present. Web IDL dictionaries are ordered maps and the dictionary members are only required keys if they are specified as required.

It is incorrect to initialize these to null; instead, we should be conditionally assigning them and leaving them absent from the map otherwise.

rubycon commented 9 months ago

Hi @jeremyroman, hostnames seems to suffer from the same issue. The basic URL parser will parse a host as a hostname or an IP address. Addresses can be represented as an single integer or as an array of integer, so they also need to be explicitly serialized, here :

To canonicalize a hostname given a string value

  • 5. Return dummyURL’s host.

The parsed host and port in the match steps will also need to be properly serialized :

To perform a match given a URLPattern urlpattern, a URLPatternInput input, and an optional string baseURLString:

  • 12. Otherwise:
    • 8. Set hostname to url’s host or the empty string if the value is null.
    • 9. Set port to url’s port or the empty string if the value is null.
rubycon commented 9 months ago

@jeremyroman Also: minor typo found in canonicalize a port. Argument passed as portValue and then referred as value.

rubycon commented 7 months ago

@jeremyroman @annevk I submitted a PR to fix this issue + the minor typo but it seems it's still pending. I applied Jeremy recommendations to my commit message and submitted a participation agreement.

I'm new to this process, is there something else I'm missing ?

jeremyroman commented 6 months ago

@rubycon did your PR fully resolve this issue? (i.e., shall I close it now?)

rubycon commented 6 months ago

@jeremyroman Yes, it's fully resolved!