Open claudepache opened 8 years ago
Personally, I am for not escaping /
inside RegularExpressionClass, because that has the property of preserving exactly the source text when it originated from a regexp literal.
@claudepache could you perhaps prepare a PR for this?
This is a followup of https://bugs.ecmascript.org/show_bug.cgi?id=1470
I’ve made a first rapid analysis of how major web browsers implement the not-exactly-specified Step 2 of EscapeRegExpPattern. Recall that, for a regexp
rx
, we have approximativelyrx.source = EscapeRegExpPattern(rx.[[OriginalSource]])
. That transformation must not change the semantics of the pattern, but is required in order thatproduces a functionally equivalent regexp as
rx
.Analysing the grammar that is used to determine the limits of a regexp literal, one can show that it suffices to:
/
outside RegularExpressionClass; and(Note that, although
/*
is parsed as a beginning of multiline-comment rather of a regular expression, this is nonproblematic because a regexp cannot ever begin with*
.)The transformations used by the major browsers are detailed below, except that the line terminators are currently not escaped by Chrome (V8 Issue 1982).
<LF>
\<LF>
\n
<CR>
\<CR>
\r
<LS>
\<LS>
\u2028
<PS>
\<PS>
\u2029
/
(outside RegularExpressionClass)\/
/
(inside RegularExpressionClass)/
(Firefox, Safari)\/
(Chrome, Edge)(?:)
It does not seems to me that implementations perform other transformations, but that needs confirmation.
In conclusion, the only major difference between implementations seems to be whether
/
is escaped everywhere or only outside RegularExpressionClass.