tc39 / proposal-regex-escaping

Proposal for investigating RegExp escaping for the ECMAScript standard
http://tc39.es/proposal-regex-escaping/
Creative Commons Zero v1.0 Universal
363 stars 32 forks source link

Present the proposal again? #43

Closed arcanis closed 3 years ago

arcanis commented 3 years ago

This proposal got rejected five years and a half ago based on concerns that, from the outside, seem fairly hard to understand. Perhaps it would lead to the same result, but I feel like it's at least worth discussing it again.

I know there's at least one question I'd like to ask of people who previously rejected it: are your concerns worth five years of incorrect manual escaping in users' code? Given that this feature borders on security, similar to SQL injections, it's not a rhetorical question: it seems important for this to be answered.

benjamingr commented 3 years ago

I am happy to work on this again but suspect the results will be the same if the same people block it in the committee.

jonathantneal commented 3 years ago

I would encourage folks to revisit this proposal, as five years has offered perspective into certain arguments blocking the proposal the first time around.

Summary: A safer and more powerful template literal tag is coming soon.

A short term fix that become a long term hazard is a common development strategy and often appears in libraries that become de facto standards. When no good long term solution is known at the time, they often become de jure standards as well. However, when a better long term solution is known at the time, and subsumes all the functionality and convenience of the short term fix, I think the committee would be failing the community to admit the short term fix merely because the good long term solution is more work. — https://github.com/benjamingr/RegExp.escape/issues/37#issuecomment-126722885

Has a better long term solution materialized? I’m not aware of RegExp.tag or anything like it shipping.

Summary: A less safe version is easy enough to invent as needed

The general feeling was that to be completely safe you need a context-dependent join operation. The feeling was then that if author code wants to do unsafe escaping, the function is easy to write, but if something is going to be standardized, it must be completely safe. — https://github.com/benjamingr/RegExp.escape/issues/37#issue-98309281

Excusing the manner of safety to the former point, has it been easy or is it now easier to write this escaping function? If easy implied simple or obvious, then I think it would be hard to prove that developers well versed in RegExp would naturally write out /[\\^$*+?.()|[\]{}]/g or something totally equivalent once they realized they needed to escape dynamic text. If easy implied familiar or consistent, then I think these differing implementations across popular developer mediums demonstrate otherwise:

From this proposal:

/[\\^$*+?.()|[\]{}]/g

From MDN:

/[.*+\-?^${}()|[\]\\]/g

From the accepted StackOverflow answer:

/[-\/\\^$*+?.()|[\]{}]/g
benjamingr commented 3 years ago

I can probably just try and PR this into Node as a static utility since TC39 rejected it - would that help you two at all or are you browser focused?

oliverfoster commented 3 years ago

Having a version in node might help the living ecosystem develop a standardised solution. It's a good punt.

benjamingr commented 3 years ago

@domenic do you think having a utility for this in Node.js is fine or would that be seen as bad faith / detrimental to standardisation efforts?

(Just asking for your opinion/advice)

@oliverfoster

Bikeshedding, something like util.escapeRegExp?

oliverfoster commented 3 years ago

Util seems about the only place sensible to my tiny brain. util.escapeRegExp or util.escapeStringRegexp as it is in the immensely well-used library escape-string-regexp? I'm sure you'll get feedback in any pr you make. Don't worry too much about the name.

benjamingr commented 3 years ago

OK, I want to do this with Domenic's blessing though - so let's give him a week to respond :]

jonathantneal commented 3 years ago

I can probably just try and PR this into Node as a static utility since TC39 rejected it - would that help you two at all or are you browser focused?

I’m unfamiliar of the relationship between Node and TC39, so my opinion will be based on whether TC39 can revisit rejected proposals. My uninformed opinion is that this must go through TC39.

If TC39 does not revisit rejected proposals, then I like your idea to PR this to Node, for the ecosystem rationale shared by @oliverfoster. I am also browser focused. I don’t like it, tho. (Yes, I have contradicting feelings. I’ll explain...)

If TC39 can revisit rejected proposals, then I strongly oppose this idea. Specifically, I wholly reject the idea of twisting the arm of our own browser and develop allies by knowingly forcing a compatibility issue upon them.

I hope my answer is sufficiently long and ~nuanced~ confusing. 😅


EDIT: I misunderstood the proposal when I wrote this. I’ve shared a correction, but want to leave my flawed response for honest accuracy.

ljharb commented 3 years ago

@benjamingr i think putting anything in an engine that belongs in the language is detrimental, even if it means it takes longer to get the proper solution.

@jonathantneal anything can be revisited at any time.

benjamingr commented 3 years ago

@ljharb

@benjamingr i think putting anything in an engine that belongs in the language is detrimental,

I agree, which is why we are only discussing this after this feature was rejected from the language and TC39 decided it shouldn't be part of the language and we've gotten continuous requests for it by users.

@jonathantneal

I’m unfamiliar of the relationship between Node and TC39

Mostly respectful, some of the same people in both orgs. Node.js doesn't do stuff that isn't valid JavaScript and when we do stuff that might be problematic from the language standpoint (like recently the cancellation APIs) we coordinate it with TC39.

Moreover, a lot of the people there are knowledgeable so it's less of a matter of "what Node must do" (note, I don't speak to Node I'm one person working on Node) but "what Node should do".

Specifically, I wholly reject the idea of twisting the arm of our own browser and develop allies by knowingly forcing a compatibility issue upon them.

Node.js can add to its own exported modules whatever it wants though? IIUC - Adding a utility to core does not in any way "twist" anyone's arm nor would Node.js ever do things that "twist" the arms of spec bodies. I consider that extremely counterproductive and and faith.

In my experience good faith collaboration is what has worked well in the past.

Though note in this proposal I am "Benjamin, proposal author for a TC39 proposal" and not "Benjamin, Node.js collaborator" or whatever. I wasn't even in Node.js when I worked on this I think ^^.

bergus commented 3 years ago

Node.js can add to its own exported modules whatever it wants though? IIUC - Adding a utility to core does not in any way "twist" anyone's arm nor would Node.js ever do things that "twist" the arms of spec bodies.

I think the concern here is that if node adds util.escapeRegExpString and later TC39 arrives at wanting RegExp.escapeString, we either a) get two very similar functions with slightly different functionality, a loss for the ecosystem, or b) TC39 specifies the exact same behaviour so that node can export const escapeRegExpString = RegExp.escapeString. Avoiding scenario A will put pressure on TC39 to accept the precedent. It's not a lot (esp. since node can introduce a change to util with any major release), but still.

benjamingr commented 3 years ago

@bergus as soon as a language level solution exists* I'm sure Node will happily soft deprecate the utility:

I don't think this in Node.js will pressure TC39 TBH? Libraries that expose this functionality (needed almost every time working with dynamic RegExp) are very popular, every other language I checked implements this feature in the language or in its platform. I don't consider TC39 "pressured" 😅

My unwillingness to work on this without the "blessing" of Domenic boils down to:

*(less likely - and again I have very little motivation to push this as a util in Node without Domenic's blessing).

benjamingr commented 3 years ago

A TC39 member has reached me in private following this thread and indicated willingness to pick this up. I'll wait until the next TC39 meeting (January?) to give this another chance in spec land.

domenic commented 3 years ago

@domenic do you think having a utility for this in Node.js is fine or would that be seen as bad faith / detrimental to standardisation efforts?

Totally fine, and makes sense to me. Especially if you keep it in require("util").

Monkeypatching it onto RegExp as a static would be a bit more controversial, but if you're interested in doing that, we could open a discussion on whatwg/html about making sure all web global environments do the same thing. But I take it your proposal is for require("util"), so that's probably not necessary.

jonathantneal commented 3 years ago

I hope you’ll forgive me. Sorry, I was errantly reacting to the idea that you would patch RegExp.escape into Node, and I see now that your suggestion was to add something behind a module. That’s a great idea, and I have no issues with it.

mdevils commented 3 years ago

@benjamingr do you know when the committee meeting is going to happen?

benjamingr commented 3 years ago

@mdevils 25.01.2021 - 28.01.2021

ljharb commented 3 years ago

I've added an agenda item to this meeting's agenda.

ljharb commented 3 years ago

Today's TC39 decided to reactivate this proposal at stage 1, to explicitly investigate the problem area - not yet endorsing any particular solution (RegExp.escape is one possible one, as is a template tag function).

I'll update the readme to reflect that shortly, and this issue can now be closed.

jaydenseric commented 2 years ago

Here are the relevant TC39 28 January, 2021 meeting notes for those interested in the details of the decision to reactivate this proposal at stage 1:

bathos commented 2 years ago

What I found surprising reading that was that there didn't seem to be much discussion of use cases or the breakdown of how people are using it most often today. Perhaps this is only because those things had been discussed a lot previously, but (skipping a few steps of speculation) it made me wonder if objections to escape as proposed in this repo would soften if it had a very explicit name like escapeLiteralSequence.

ljharb commented 2 years ago

I don’t believe there’s any need to further explain the use cases, not do i think the naming matters. Those objecting to the escape solution do so because they believe it’s a potentially dangerous footgun, even if that never comes up in practice.