Tests and spec don't agree about base URL in match result `inputs`

lucacasonato commented 3 years ago

As per spec, the second item in the inputs field of the match result should be the stringification of the parsed base url URL (it actually says to add the URL object itself, but I'll just consider that a bug).

For this test case, the stringification of the base url is 'https://example.com/".

{
    "pattern": [{ "pathname": "/foo/bar" }],
    "inputs": [ "./foo/bar", "https://example.com" ],
    "expected_match": {
      "hostname": { "input": "example.com", "groups": { "0": "example.com" } },
      "pathname": { "input": "/foo/bar", "groups": {} },
      "protocol": { "input": "https", "groups": { "0": "https" } }
    }
  }

Yet the test runner wants it to be https://example.com. This is the case in Chromium right now (it appends the URL string, not the stringification of the parsed url). See https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/modules/url_pattern/url_pattern.cc;l=496-497;drc=1a1342c1f2d295948d79211cd860721732990ed8

I would say that the spec behaviour actually makes more sense, and that the test should be fixed to look like this:

{
    "pattern": [{ "pathname": "/foo/bar" }],
    "inputs": [ "./foo/bar", "https://example.com" ],
    "expected_match": {
      "inputs": [ "./foo/bar", "https://example.com/" ],
      "hostname": { "input": "example.com", "groups": { "0": "example.com" } },
      "pathname": { "input": "/foo/bar", "groups": {} },
      "protocol": { "input": "https", "groups": { "0": "https" } }
    }
  }

If we say this is right, then I suggest for the URLParserInit we should also change the spec & Chromium to have the inputs array contain the "processed" URLParserInit instead of the original.

wanderview commented 3 years ago

This was discussed in issues previously at:

https://github.com/WICG/urlpattern/issues/34

The intent was to make the result.inputs have the original unprocessed value. So I'm inclined to fix the spec to match the implementation.

lucacasonato commented 3 years ago

Ah, missed that issue. Seems fine, although I do wonder why this inputs field is needed at all then. If the input is just the unprocessed value, I'd argue there is no point in returning it at all (the user already has it, as they passed it in).

wanderview commented 3 years ago

Its a weak use case, but the idea was the user may want to pass the result on to other code to process and it may expect the same exact input for some other comparison/operation. This matches what Regexp.exec() does, etc.

lucacasonato commented 3 years ago

Ok - I could maybe see some value here, but it seems to me it wouldn't be much more difficult to just pass the inputs manually as user. I sorta had the expectation when I first saw the dictionary definition that these "inputs" are the normalized values, as this "plain passthrough" is not very common.

wanderview commented 3 years ago

Its not too late to change this. I'm just not sure which is better. @domenic do you have any thoughts here?

domenic commented 3 years ago

This is a tough one. I'm afraid I can't be of much help as a tiebreaker; I see both sides.

My only contribution is that I could see the RegExp precedent cutting either way. Is inputs meant to respresent the input string, or the input URL? Clearly for RegExps it's the input string, but maybe for URLPattern the input domain is more like URLs, and we just happen to represent them as strings?

wanderview commented 3 years ago

@jeffpossnick do you have an opinion from a web developer point of view?

wanderview commented 3 years ago

I made a twitter poll as well:

https://twitter.com/wanderview/status/1435342537288556548

lucacasonato commented 3 years ago

Looking at the poll, my opinion is obviously in the minority. I'd say let's keep behaviour like it is implemented in Chrome, and update the spec. Seems most people's expectations would be met. The normalized values are always available via the input on individual component results anyway.

lucacasonato commented 3 years ago

Someone on Twitter did bring up this point: https://twitter.com/bhathos/status/1435353793588350979. This shows that the expectation is that the matcher matches against whatever "result.input" is - that is however not the case. The matcher matches against the normalized value of result.input. Maybe there would be more insight into the system by users if the matcher returned the normalized inputs.

I'm impartial now.

wanderview commented 3 years ago

The matcher also doesn't operate on the entire inputs array. It operates on individual component values after normalization. Those normalized inputs are available at result.pathname.input, etc. Combined with result.inputs that lets you see what normalization took place before the matcher ran.

As I wrote in the previous issue, though, it is all a bit weird and I'm not sure if there is a perfect answer. I think the current spec is the least bad approach, though.

wanderview commented 3 years ago

Final poll results:

173 votes
46.8% for original raw value
17.3% for encoded value
35.8% don't know

I'm going to go ahead and update the spec to use the original raw value.

whatwg / urlpattern

Tests and spec don't agree about base URL in match result `inputs` #133