zloirock / core-js

Standard Library
MIT License
24.52k stars 1.65k forks source link

`URLPattern` #1011

Open zloirock opened 2 years ago

zloirock commented 2 years ago

Since core-js already contains URL and URLSearchParams, it could be good to implement URLPattern. I hope that we could reuse the URL parser from the web.url module.

https://developer.mozilla.org/en-US/docs/Web/API/URL_Pattern_API https://wicg.github.io/urlpattern/ https://github.com/kenchris/urlpattern-polyfill (however, I think that core-js should follow another way)

I still didn't start to work on it, so if someone wanna contribute - feel free to do it.

precious-void commented 2 years ago

@zloirock Hey, unfortunately, it looks like they emphasize the difference between URL parser and theirs:

The URLPattern constructor string algorithm is very similar to the basic URL parser algorithm, but some differences prevent us from using that algorithm directly.

First, the URLPattern constructor string parser operates on tokens generated using the "lenient" tokenize policy. In constrast, basic URL parser operates on code points. Operating on tokens allows the URLPattern constructor string parser to more easily distinguish between code points that are significant pattern syntax and code points that might be a URL component separator. For example, it makes it trivial to handle named groups like ":hmm" in "https://a.c:hmm.example.com:8080" without getting confused with the port number.

Second, the URLPattern constructor string parser needs to avoid applying URL canonicalization to all code points like basic URL parser does. Instead, we perform canonicalization on only parts of the pattern string we know are safe later when compiling each component pattern string.

Finally, the URLPattern constructor string parser does not handle some parts of the basic URL parser state machine. For example, it does not treat backslashes specially as they would all be treated as pattern characters and would require excessive escaping. In addition, this parser may not handle some more esoteric parts of the URL parsing algorithm like file URLs with a hostname. The goal with this parser was to handle the most common URLs while allowing any niche case to be handled instead via the URLPatternInit constructor.

https://wicg.github.io/urlpattern/#constructor-string-parsing

I have started some "draft" branch, but unfortunately, I'm stuck in several places and I'm not sure how to solve them. 1) Usage of RegExpCreate on line https://github.com/shtelzerartem/core-js/blob/feature/url-pattern/packages/core-js/modules/web.url-pattern.js#L580 2) How to check if code point is contained in IndentifierStart / IndentifierPart? And honestly, I'm not really sure what these things are. On line https://github.com/shtelzerartem/core-js/blob/feature/url-pattern/packages/core-js/modules/web.url-pattern.js#L30

zloirock commented 2 years ago

Thanks, @shtelzerartem.

Yes, I saw this note. They are different, but in my vision, it can be possible to modify the parser for both cases.

RegExpCreate is not a problem, it can be just a RegExp constructor, the problem is in the usage of the u flag. core-js does not polyfill the u flag - that requires a full Unicode implementation that could be too heavy. I think that it's possible to detect the u flag support in the engine, if it's supported - create a regex with this flag. If it's not supported and does not contain entries that require this flag - create a regex without any flags, otherwise - throw an error.

The same situation with IndentifierStart / IndentifierPart. For example, for the same reason, core-js uses incomplete RegExpIdentifierName for NCG.

zloirock commented 2 years ago

https://github.com/nodejs/node/pull/42133