Closed benjamingr closed 9 years ago
To me, RegExp.escape
's only guarantee is that the string is safe to pass into RegExp
. The length of the returned string is basically irrelevant to me.
Would not any additions to the RegExp
grammar simply require corresponding changes to the RegExp.escape
implementation? (assuming it's worded correctly)
Would not any additions to the RegExp grammar simply require corresponding changes to the RegExp.escape implementation? (assuming it's worded correctly)
What concerns me is that flags like /x
could be added in the future which
would, for example, add comments (with a #
syntax).
Moreover from what I understand implementations may choose to extend the RegularExpression syntax. Maybe allowing them to escape additional characters and suggesting that an implementation must ensure that the return result is possible to pass to a RegExp constructor?
On Fri, Jun 19, 2015 at 10:02 PM, Jordan Harband notifications@github.com wrote:
To me, RegExp.escape's only guarantee is that the string is safe to pass into RegExp. The length of the returned string is basically irrelevant to me.
Would not any additions to the RegExp grammar simply require corresponding changes to the RegExp.escape implementation? (assuming it's worded correctly)
— Reply to this email directly or view it on GitHub https://github.com/benjamingr/RegExp.escape/issues/27#issuecomment-113607312 .
That's exactly the kind of wording I'm thinking of. With that requirement in place, a builtin RegExp.escape
would always (theoretically) be safe, and a polyfilled one could feature-detect these new forms easily.
So something like?
Note: a valid implementation must ensure that the return result does not result in an abrupt completion when passed as the first and only argument to the RegExp constructor. An implementation may choose to escape additional characters in order to maintain this guarantee.
That's good, but the concern isn't just an abrupt completion, it's also about semantics changing as a new flag is added, right?
@ljharb what do you mean?
NOTEs are non-normative, just remove that prefix.
Thanks, what about the actual content?
Since your text says "first and only" argument, and since I can't imagine anyone making any changes to regexes without requiring a new flag, I think that text is definitely sufficient. However, if someone were to make an existing flag do something different (which would also be a breaking change) then this wouldn't cover it - but that's probably not much of a real concern.
Yes, maybe it should say "any flag passed as a second argument". I want to
be safe against a future /x
mode for comments.
On Sat, Jun 20, 2015 at 8:49 PM, Jordan Harband notifications@github.com wrote:
Since your text says "first and only" argument, and since I can't imagine anyone making any changes to regexes without requiring a new flag, I think that text is definitely sufficient. However, if someone were to make an existing flag do something different (which would also be a breaking change) then this wouldn't cover it - but that's probably not much of a real concern.
— Reply to this email directly or view it on GitHub https://github.com/benjamingr/RegExp.escape/issues/27#issuecomment-113793095 .
I agree with @ljharb that it's not just about abrupt completions but also about semantic equivalence. IIRC Ruby had something like new RegExp(RegExp.escape(s)) ~= s
as the invariant, which seems more interesting.
Yes, I agree. What about:
A valid escape implementation must ensure that the return result holds the invariant that when passed as the first and only argument to the RegExp constructor the resulting regular expression will test true against the input string. That is
new RegExp(RegExp.escape(s)).test(s) === true
for any strings
. An implementation may choose to escape additional characters in order to maintain this guarantee.
Do you see any downside to allowing implementations to escape additional characters? Are we sure we're ok with not giving any length guarantee?
I guess I'm not clear when the normative spec text as-is would not already guarantee that.
I guess I'm not clear when the normative spec text as-is would not already guarantee that.
Regarding implementations and regular expressions:
An implementation may extend the ECMAScript Regular Expression grammar defined in 21.2.1, but it must not extend the RegularExpressionBody and RegularExpressionFlags productions defined below or the productions used by these productions.
Now I'm not 100% sure either, but I think it means that regular expression syntax can be extended to a point where other implementations would need to escape more characters.
I guess specifically tie it to that clause then. Or maybe we should just remove that clause; it seems bad for the web in theory, and unused in practice, as far as I know.
I've sent a mail to esdiscuss, to be fair I'm not sure why this is the case either.
I suspect the loose spec around RegExp flags is pre-ES6. Now that we can extend built-ins, I'd expect that any engine which wants to add new RegExp syntax would do so as a subclass (and then extend RegExpSubclass.escape appropriately).
@cscott ES3 specifies:
An implementation may extend the regular expression constructor's grammar, but it should not extend the RegularExpressionBody and RegularExpressionFlags productions or the productions used by these productions.
Earlier versions of the spec did not have regular expressions. I wonder if any engines still in use in 2015 use this capability.
I've sent an email about extensions to esdiscuss: https://esdiscuss.org/topic/forbid-implementations-from-extending-the-regexp-grammar
If we can forbid implementations from extending the implementation on their own and not through the standard then this addition will not be required and the proposal will be simplified.
We should include some note about this in the TC39 presentation. I think it should be part of the proposal to eliminate that language.
As such, I would update the spec document with an appropriate <del>
.
@domenic good idea, I want to make sure that I didn't miss anything first (like an implementation that utilizes it. I'm not sure how else I can verify that - I checked current and older engines and asked around.
If this is indeed safe to remove - I agree that it's a good idea to tie it to this proposal and add an appropriate <del>
clause (and a slide in the presentation).
Yep. And we can drop it if TC39 objects; it's a separable part of the proposal to delete that. But I hope nobody will.
I'd rather put in language limiting extensions to the de facto (?...)
extension mechanism (and flags). That also guarantees that RegExp.escape
remains safe (since all proposals escape both (
and ?
).
Is there any syntax in perlre which gets past our current proposals for
RegExp.escape? If so, it may be worthwhile to future-proof
escape`.
--scott
This isn't really about RegExp.escape
as an issue. We can always extend
RegExp.escape
when we make additions to the RegExp grammar as a standard
that would not be a problematic issue since we can keep both parts in sync
relatively easily.
On the other hand currently implementations are allowed to diverge in their regular expressions which means there is no guarantee that they'll behave the same in some cases. This is really risky for standard behavior and if an implementation decides to use the clause it could create a big pain point for developers while keeping it compliant.
Luckily, implementors are insightful people and implementations don't do that because of precisely those usability implications for end developers.
What I'm suggesting is that we forbid implementations from extending the regular expressions grammar beyond the language specification and outside the process. Basically, we take the ECMAScript regular expression object as specified (minus allowing extensions) as our base line :)
Cheers, Benji
On Mon, Jul 6, 2015 at 5:03 PM, C. Scott Ananian ecmascript@cscott.net wrote:
I think it would be more worthwhile of we tried to draw a compatibility boundary. Taking perlre as a baseline, for example, are there additional characters we should escape in
RegExp.escape
so that implementations (and the language itself) could add more perlre features without breaking compatibility? The(?...)
syntax (and flags) seems to be the de facto extension point, can we protect that more narrowly? --scott On Jul 6, 2015 1:56 AM, "Benjamin Gruenbaum" benjamingr@gmail.com wrote:So, following work on RegExp.escape [1] I found out that implementations may extend the regular expression grammar in JavaScript [2]. However, when asking esdiscuss and Stack Overflow about it [2][3] it doesn't look like any implementations currently do so (*).
Can we please forbid implementations from extending the regular expression syntax? It seems like this could cause compatibility issues between implementations anyway. We have subclassable RegExp with hooks and symbols in place and implementations that want to provide an extended RegExp can subclass RegExp or propose an extension to the language itself.
[1] https://github.com/benjamingr/RegExp.escape [2] https://esdiscuss.org/topic/why-are-implementations-allowed-to-extend-the-regular-expressions-syntax [3] http://stackoverflow.com/questions/30958288/what-ecmascript-implementations-extend-the-regexp-syntax
(*) some did it to implement ES2015 features before ES2015.
es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
Benji -- but you've just specifically mentioned that implementations are already using the flexibility provided by the spec to experiment with and implement ES6 features. Why are you going to foreclose that possibility for ES2016+?
The perl community has managed to compatibly extend their regex engine
to an almost absurd degree. There are plenty of ways to add features
to your regexp that don't break "standard-compliant" regexps. I think
that's the spirit of the existing language in the spec, and it
should be honored. I'm fine with narrowing the scope, so we know that
new flags and characters after (?
are "reserved", for example, but I
don't see why we have to ban future change outright.
--scott
On Mon, Jul 6, 2015 at 10:13 AM, Benjamin Gruenbaum benjamingr@gmail.com wrote:
This isn't really about
RegExp.escape
as an issue. We can always extendRegExp.escape
when we make additions to the RegExp grammar as a standard that would not be a problematic issue since we can keep both parts in sync relatively easily.On the other hand currently implementations are allowed to diverge in their regular expressions which means there is no guarantee that they'll behave the same in some cases. This is really risky for standard behavior and if an implementation decides to use the clause it could create a big pain point for developers while keeping it compliant.
Luckily, implementors are insightful people and implementations don't do that because of precisely those usability implications for end developers.
What I'm suggesting is that we forbid implementations from extending the regular expressions grammar beyond the language specification and outside the process. Basically, we take the ECMAScript regular expression object as specified (minus allowing extensions) as our base line :)
Cheers, Benji
On Mon, Jul 6, 2015 at 5:03 PM, C. Scott Ananian ecmascript@cscott.net wrote:
I think it would be more worthwhile of we tried to draw a compatibility boundary. Taking perlre as a baseline, for example, are there additional characters we should escape in
RegExp.escape
so that implementations (and the language itself) could add more perlre features without breaking compatibility? The(?...)
syntax (and flags) seems to be the de facto extension point, can we protect that more narrowly? --scottOn Jul 6, 2015 1:56 AM, "Benjamin Gruenbaum" benjamingr@gmail.com wrote:
So, following work on RegExp.escape [1] I found out that implementations may extend the regular expression grammar in JavaScript [2]. However, when asking esdiscuss and Stack Overflow about it [2][3] it doesn't look like any implementations currently do so (*).
Can we please forbid implementations from extending the regular expression syntax? It seems like this could cause compatibility issues between implementations anyway. We have subclassable RegExp with hooks and symbols in place and implementations that want to provide an extended RegExp can subclass RegExp or propose an extension to the language itself.
[1] https://github.com/benjamingr/RegExp.escape [2] https://esdiscuss.org/topic/why-are-implementations-allowed-to-extend-the-regular-expressions-syntax [3] http://stackoverflow.com/questions/30958288/what-ecmascript-implementations-extend-the-regexp-syntax
(*) some did it to implement ES2015 features before ES2015.
es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss
This doesn't look like a big deal. It seemed like I wasn't able to get the point (forbid 'allowed extension' and ask implementations to go through the spec after they extend RegExp or do it in a subclass) across. A shame but not a big issue for this proposal - we probably shouldn't tie this proposal to it.
While researching Perl today we've found an interesting issue:
To me, this is a huge deal. I'm wondering - if implementations are allowed to extend the regular expressions grammar - should they be allowed to escape additional characters and we should make it clear that there is no length guarantee on the returned string?
(No other language has a length guarantee but some have a fixed character set which implies a length guarantee)