Open zloirock opened 2 years ago
@zloirock Hey, unfortunately, it looks like they emphasize the difference between URL parser and theirs:
The URLPattern constructor string algorithm is very similar to the basic URL parser algorithm, but some differences prevent us from using that algorithm directly.
First, the URLPattern constructor string parser operates on tokens generated using the "lenient" tokenize policy. In constrast, basic URL parser operates on code points. Operating on tokens allows the URLPattern constructor string parser to more easily distinguish between code points that are significant pattern syntax and code points that might be a URL component separator. For example, it makes it trivial to handle named groups like ":hmm" in "https://a.c:hmm.example.com:8080" without getting confused with the port number.
Second, the URLPattern constructor string parser needs to avoid applying URL canonicalization to all code points like basic URL parser does. Instead, we perform canonicalization on only parts of the pattern string we know are safe later when compiling each component pattern string.
Finally, the URLPattern constructor string parser does not handle some parts of the basic URL parser state machine. For example, it does not treat backslashes specially as they would all be treated as pattern characters and would require excessive escaping. In addition, this parser may not handle some more esoteric parts of the URL parsing algorithm like file URLs with a hostname. The goal with this parser was to handle the most common URLs while allowing any niche case to be handled instead via the URLPatternInit constructor.
https://wicg.github.io/urlpattern/#constructor-string-parsing
I have started some "draft" branch, but unfortunately, I'm stuck in several places and I'm not sure how to solve them. 1) Usage of RegExpCreate on line https://github.com/shtelzerartem/core-js/blob/feature/url-pattern/packages/core-js/modules/web.url-pattern.js#L580 2) How to check if code point is contained in IndentifierStart / IndentifierPart? And honestly, I'm not really sure what these things are. On line https://github.com/shtelzerartem/core-js/blob/feature/url-pattern/packages/core-js/modules/web.url-pattern.js#L30
Thanks, @shtelzerartem.
Yes, I saw this note. They are different, but in my vision, it can be possible to modify the parser for both cases.
RegExpCreate
is not a problem, it can be just a RegExp
constructor, the problem is in the usage of the u
flag. core-js
does not polyfill the u
flag - that requires a full Unicode implementation that could be too heavy. I think that it's possible to detect the u
flag support in the engine, if it's supported - create a regex with this flag. If it's not supported and does not contain entries that require this flag - create a regex without any flags, otherwise - throw an error.
The same situation with IndentifierStart
/ IndentifierPart
. For example, for the same reason, core-js
uses incomplete RegExpIdentifierName
for NCG.
Since
core-js
already containsURL
andURLSearchParams
, it could be good to implementURLPattern
. I hope that we could reuse theURL
parser from theweb.url
module.https://developer.mozilla.org/en-US/docs/Web/API/URL_Pattern_API https://wicg.github.io/urlpattern/ https://github.com/kenchris/urlpattern-polyfill (however, I think that
core-js
should follow another way)I still didn't start to work on it, so if someone wanna contribute - feel free to do it.