whatwg / urlpattern

URL Pattern Standard
https://urlpattern.spec.whatwg.org/
Other
160 stars 22 forks source link

No option to make URLPattern case insensitive #148

Closed posva closed 2 years ago

posva commented 3 years ago

There should be an option to make the test case insensitive. Right now it's always case sensitive:

const p = new URLPattern('/A/:id', 'http://e.e')
p.exec('/A/2', 'http://e.e') // { ... }
p.exec('/a/2', 'http://e.e') // null
wanderview commented 3 years ago

Can you expand on the use case a bit more? Is this something needed by frameworks like vue?

FWIW, we chose the default behavior to match URL semantics. Hostname is effectively case-insensitive, but other components are case-sensentive.

posva commented 3 years ago

By default, most routers are case insensitive when it comes to the pathname because it is just more convenient. I'm convinced this is a necessary feature for client-side routing as I've seen users rely on very often. As an example, we've had bug reports of the base tag not being treated as case insensitive by default.

wanderview commented 3 years ago

Would you only want pathname to be insensitive? Just wondering if a global option that affected all components would work or if we would need to have a way to specify which ones should be insensitive.

posva commented 3 years ago

in my experience, this only applied to the pathname (not query, not hash). By default, the query and the hash are case sensitive in vue router (and I believe other routers as well) and there is no way to customize it but this is because the URL patterns used only concern the pathname too πŸ˜…

wanderview commented 3 years ago

So maybe something like:

// only pathname is insensitive
const p = new URLPattern('/foo/bar', self.location, { insensitive: 'pathname' });

// all components are insensitive (but we are only looking at pathname)
const p2 = new URLPattern({ pathname: '/foo/bar' }, { insensitive: 'all' });
posva commented 3 years ago

I wonder because as you said the hostname is case insensitive by spec so that wouldn't affect it, would it?

wanderview commented 3 years ago

The hostname and protocol get normalized to lowercase, so yea it wouldn't do much there.

posva commented 3 years ago

I wonder if there is a usecase to have case insensitive match in query and hash. If there is not, maybe a boolean option would make more sense. I don't see users setting only the pathname and hash to insensitive but not the query either (I don't see a reason to make the query or the hash case insensitive but I don't think I have enough knowledge about it)

Jamesernator commented 2 years ago

For the record there is a stage 2 proposal for allowing you to add/remove flags from any part of a regexp. Because URLPattern pattern regexp parts are just JS regexps, once that lands you should be able to do:

const pattern = new URLPattern({
    pathname: "/(?i:foo)/",
});

pattern.exec({ pathname: "/foo" }); // fine
pattern.exec({ pathname: "/fOO" }); // also fine
posva commented 2 years ago

That's good to know! But it would be really unintuitive for such a common use case (any frontend router) πŸ˜„

Sayan751 commented 2 years ago

Added a comment in the discussion https://github.com/WICG/urlpattern/discussions/39, citing use-cases. Considering the low-traffic there, cross-posting it here as well, as this seems to be the active open issue on this topic.

Sayan751 commented 2 years ago

Opened a PR for the polyfill here: https://github.com/kenchris/urlpattern-polyfill/pull/100. The shape of the API can be discussed further, but I think the feature is absolutely necessary. Would appreciate any feedback.

wanderview commented 2 years ago

Should an API change here flip the entire URLPattern to be case insensitive? Or should it be a setting on a per-component basis; e.g. only pathname, hostname, etc?

A possible work around for the moment would be to normalize input to lower case and make the pattern operate on lower case.

Sayan751 commented 2 years ago

Or should it be a setting on a per-component basis; e.g. only pathname, hostname, etc?

As the previous discussion points out, the hostname, and protocol are already case-insensitive. Thus, I think on those components the case-sensitivity is lost. However, now I have a feeling that per-component case-sensitivity might be a bit too much and perhaps is not pragmatic as well (someone wanting only case-insensitivity only for pathname and not for hash or search might be a bit far-fetched use-case). I think a general caseSensitive flag can support the following components: pathname, search, and hash. With that, it becomes:

const pattern = new URLPattern({ 
  pathname: '/home', 
  caseSensitive: false /* applicable for search and hash as well */
});

normalize input to lower case and make the pattern operate on lower case.

Changing the user input is something, I would like to avoid as that might cause potential loss of data.

SanderElias commented 2 years ago

Hmm, I think this is highly dependent on the use case. I checked a couple of routers, and the does I found are case-sensitive. This makes a load of sense since they mimic server behavior, (for the pathname part), and most (if not all) servers are also case-sensitive. When an app wants to go jamstack, that means it must comply with what the server is supporting. Retrofitting case sensitivity into an app is not a small task. (the other way around is way easier).

I believe that catering to case sensitivity in URLPattern is giving devs a bigger problem in the future, compared to fixing their casing when building an app.

If they really want they will be able to use the regex helper, which can easily be put in a helper function:

const caseInsensitive = (searchString) => `/(?i:${ searchString }/`; // probably needs proper escaping!
const pattern = new URLPattern({
    pathname: caseInsensitive("foo"),
});

pattern.exec({ pathname: "/foo" }); // fine
pattern.exec({ pathname: "/fOO" }); // also fine

I know from experience that being case-insensitive can lead to very hard to debug problems, that will take hours to figure out.

Sayan751 commented 2 years ago

I believe that catering to case sensitivity in URLPattern is giving devs a bigger problem in the future, compared to fixing their casing when building an app.

Note that the case-sensitivity in the URLPattern is an opt-in feature. Only if you are dealing with case-insensitive URLs you can set this flag, or else you can forget about this. Therefore, it is not clear to me how this is going to cause a bigger problem.

The ?i: pattern modifier proposal is yet to be widely supported. Until then, we need a clean way to support case-insensitivity. However, in my opinion, the case-sensitivity should be baked into the URLPattern API so that it makes the API easy to use and reduce boilerplate code.

posva commented 2 years ago

Not only this would be an opt it but note that most client routers are case insensitive by default because it’s more convenient (eg react router and Vue router)

I checked a couple of routers, and the does I found are case-sensitive.

maybe you were checking on server routers, in which case this makes sense but since this api needs to cater to both, this is a necessity

Sayan751 commented 2 years ago

server routers

I worked mainly with ASP.NET services; never saw case-sensitive routing there as well.

Sayan751 commented 2 years ago

@wanderview Do you think we can reach an agreement on the API shape, as I have outlined in my previous comment?

wanderview commented 2 years ago

@wanderview Do you think we can reach an agreement on the API shape, as I have outlined in my https://github.com/WICG/urlpattern/issues/148#issuecomment-1163480007?

I'm leaning away from a shape like you have in https://github.com/WICG/urlpattern/issues/148#issuecomment-1163480007. That would require modifying the URLPatternInit dictionary type in a way that would not make sense for all its uses. It also wouldn't work for the constructor variant that takes a string instead of an object.

We could try to add an additional options argument at the end like this:

new URLPattern({ pathname: 'foo*' }, { caseInsensitive: true });

But it gets a bit awkward to define since we can have an optional second argument as the baseURL, but sometimes you might just want to have the options dictionary. For example, we would want to support all of these forms:

// Support both of these forms:
new URLPattern({ pathname: 'foo*' }, { caseInsensitive: true });
new URLPattern('/foo*', baseURL, { caseInsensitive: true });

// Support one form of the following
new URLPattern('https://example.com/foo*', { caseInsensitive: true });
new URLPattern('https://example.com/foo*', /*baseURL=*/ null, { caseInsensitive: true });

That could probably be sorted out, but could be a bit ugly in webidl.

Another option would be to have a getter/setter to flip the mode on the pattern:

const p = new URLPattern({ pathname: 'foo*' });
p.caseSensitive = false;

That avoids the complexity of the various constructor forms at the cost of making the URLPattern mutable.

@domenic do you have any thoughts on shape here?

Sayan751 commented 2 years ago

It also wouldn't work for the constructor variant that takes a string instead of an object.

@wanderview You are right on that. Missed that use case.

Another option would be to have a getter/setter to flip the mode on the pattern

Probably I would not go that way, as currently (in polyfill) the pattern regexes are created in the constructor. (Even ignoring the polyfill) With a getter in place, the regexes needs to be created lazily and in worst cases, needs to be rebuilt if the flag changes. I think that may cause unnecessary surprises.

IMO a readonly flag set via ctor is most sensible. Although I don't see any issue with your { caseInsensitive: true } proposal, I think the following would be a bit less hairy, considering the chance that there might not be any other 'control' options.

// synonymous
new URLPattern({ pathname: 'foo*' });
new URLPattern({ pathname: 'foo*' }, /* caseInsensitive */ true);

// case-insensitive
new URLPattern({ pathname: 'foo*' }, /* caseInsensitive */ false);

// extended
new URLPattern({ pathname: 'foo*' }, /* baseUrl */ 'https://example.com'); // case-sensitive
new URLPattern({ pathname: 'foo*' }, /* baseUrl */ 'https://example.com', /* caseInsensitive */ true); // also supported
new URLPattern({ pathname: 'foo*' }, /* baseUrl */ 'https://example.com', /* caseInsensitive */ false);
domenic commented 2 years ago

Definitely don't use bare-booleans; see https://w3ctag.github.io/design-principles/#prefer-dict-to-bool .

I think it'd be OK to define this with Web IDL overloads:

interface URLPattern {
  constructor(optional URLPatternInput input = {}, optional USVString baseURL, optional URLPatternOptions options = {});
  constructor(optional URLPatternInput input = {}, optional URLPatternOptions options = {});
};

I would suggest using the name ignoreCase for the option to match RegExp.prototype.ignoreCase.

Sayan751 commented 2 years ago

Definitely don't use bare-booleans; see https://w3ctag.github.io/design-principles/#prefer-dict-to-bool .

I think it'd be OK to define this with Web IDL overloads:

interface URLPattern {
  constructor(optional URLPatternInput input = {}, optional USVString baseURL, optional URLPatternOptions options = {});
  constructor(optional URLPatternInput input = {}, optional URLPatternOptions options = {});
};

I would suggest using the name ignoreCase for the option to match RegExp.prototype.ignoreCase.

Sounds good to me.

wanderview commented 2 years ago

FYI, I have opened a chrome status entry for this so I can start implementing the spec change from @Sayan751 in https://github.com/WICG/urlpattern/pull/168. See: https://chromestatus.com/feature/5206436850696192

wanderview commented 2 years ago

I need to do an additional follow-up commit to fix a small issue with the spec PR. I couldn't figure out how to modify the existing PR in-place correctly.

wanderview commented 2 years ago

Spec changes have landed. I sent the intent to ship in chrome 107 here:

https://groups.google.com/a/chromium.org/g/blink-dev/c/KgAdo3kB1wc/m/UN70zkeNAwAJ