Closed rbuckton closed 11 months ago
To be clear, the ecosystem isn't interested in a tag function, it's just a few delegates that are.
With a tag function, I think it would be much clearer to mimic regex literal syntax as much as possible - which includes flags going last, like in the RegExp constructor.
To be clear, the ecosystem isn't interested in a tag function, it's just a few delegates that are.
With a tag function, I think it would be much clearer to mimix regex literal syntax as much as possible - which includes flags going last, like in the RegExp constructor.
I'm not sure I agree. If both new RegExp("/")
and new RegExp(String.raw`/`)
match the literal string "/"
, then RegExp.tag`/`
should as well or it would be a refactoring hazard. It also adds unnecessary complexity to essentially require double-delimiting a regular expression: RegExp.tag`/foo/`
.
I think RegExp.tag`(?u)foo`
is not only far clearer, but also allows the reader to know what flags are in play at the start of a potentially multi-line RegExp, rather than needing to scan to the end to find the flags.
I agree with @rbuckton, and there are also other possibilities in the design space that avoid embedding what looks like a full and essentially double-delimited regular expression literal rather than just a body. For example,
RegExp.tag("u")`^\p{ID_Start}\p{ID_Continue}*$`
(although my preference is for the simpler body-only RegExp.tag`(?u)^\p{ID_Start}\p{ID_Continue}*$`
)
I feel pretty strongly here that we need to optimize for readability, not writability - the more different it looks from a regex literal, the harder it will be to explain to newcomers how to understand code that uses it.
I feel pretty strongly here that we need to optimize for readability, not writability - the more different it looks from a regex literal, the harder it will be to explain to newcomers how to understand code that uses it.
My previous comment specifically pointed out that I think a leading (?imsuvx)
improves readability, since you don't need to scan to the end to find the flags. I also think double-delimiting via `/ ... /`
does not improve readability, but instead is more likely to add confusion. @gibson042's suggestion of RegExp.tag(flags)`pattern`
is reasonable, but has the downside of allocating a new function closure for each invocation.
I'd also be concerned that an example like
RegExp.tag`
/foo/
`
would lead people to mistakenly believe that removing the tag is safe, which would not be the case for
RegExp.tag`
/
foo
/x
`
I'd also note that several other languages, including Perl, allow you to use alternate delimiters for regular expressions, including <>
and ''
, so I don't think using ``
as the delimiter on its own is too wild of a notion.
/foo/
today means RegExp literal. A tagged template is not a RegExp literal, it is just a string that contains RegExp pattern characters, not unlike new RegExp("")
. RegExp literals have the downside that you must escape /
to use it in a pattern. A tagged template will already require that you escape `
to use it in a pattern. Requiring you also escape /
as well would not be ideal.
To clarify, I think readability necessitates maximal consistency with regex literals, as well as the RegExp constructor, both of which have flags last.
I find the composability argument very compelling; it would address real pain points I’ve encountered when trying to assemble grammars from common components. I also think flags-first is a readability improvement (and even a kind of hazard reduction), personally, but appreciate that the inconsistency would be unfortunate.
[EDIT: this comment was a long rant I wrote before getting enough context, sorry. See my next comment instead.]
Oops I should have read more before writing so much :flushed: I'll hide my previous comment.
Oh I see https://github.com/tc39/proposal-regexp-modifiers proposes a scoped syntax!
What's the benefit you see for RegExp.tag`(?x)...`
over `RegExp.tag`(?x:...)`
?
For the example given IMHO this works just as well:
RegExp.tag`/(?x
# escaped, case sensitive
${string}
# nested RegExp is *not* escaped.
${caseSensitive ? /Z/ : /Z/i}
# And same style allows for locally scoped flags without need of nested RegExps!
# e.g. if that case-insensitivity wasn't dynamic, you can just write:
(?i:Z)
)/`;
IIUC the only use case for leading global syntax is expressing behavior tags (?gdy)
where scoping doesn't make sense, and for which RegExp.tag
needs some way to express. And that would allow dropping the slashes.
Does the current proposal allow RegExp.tag`/regexp/with/slashes/g`
or do the inner slashes have to be escaped? If they must be escaped, I have to say the argument to drop slashes and pass global flags somehow else becomes compelling.
Especially since /
vs. \/
is not meaningful at the layer of tagged literals, you need \\/
to get \/
in the template. So unlike /regexp\/with\/slashes/
you'd have to write ` RegExp.tag`/regexp\\/with\\/slashes/g`
. So if that's necessary, "familiarity" is undermined.
RegExp.tag
strips first and last slashes only, treating any other slashes as literal characters.Hmm, that escaping issue is much wider. All distinctions regexps make between metacharacters like +
and literals like \+
need double backslash for the backslash to get to the template function :cry:! Am I missing anything?
>> console.log`foo+bar`
Array [ "foo+bar" ]
>> console.log`foo\+bar`
Array [ "foo+bar" ]
>> console.log`foo\\+bar`
Array [ "foo\\+bar" ]
On one hand, this means \\/
problem is not special.
OTOH, it means any attempt to explain it as "usual /regexp/
syntax wrapped with RegExp.tag`...`
" promotes a wrong mental model! :warning: Because the central benefit of builtin /.../
syntax is backslashes inside have the regexp meaning directly, with no double-escaping nonsense.
It's much better to think of these as similar to new RegExp("...")
with explicit awareness that it's a JS string syntax, with JS delimiters & escape processing, then undergoing regexp parsing.
OK, I'm sold. Ignore my previous objections to (?i)...
, I don't care strongly if you do that RegExp.tag('i')
...`, just don't mislead people with slashes.
Closing, since the non-tag function is what advanced to stage 2 today.
The RegExp modifiers proposal originally included an unbounded
(?ims-ims)
operator, but that was recently dropped from the proposal. Some RegExp engines support a version of(?ims-ims)
that can only appear at the start of a regular expression and applies to the entire pattern.I've been considering re-introducing this prefix-only form in a follow-on proposal, and it could be helpful here as well:
The prefix-flags form could theoretically allow all RegExp flags, not just the restricted subset in the Modifiers proposal. It would also remove the necessity for
RegExp.tag
to require leading and trailing/
characters, and could even improve composability:In this case, flags from nested RegExp objects such as
i
,m
, ands
could be preserved in the resulting RegExp using a modified group (i.e.,(?-i:Z)
or(?i:Z)
based on the condition above).