tc39 / proposal-regexp-named-groups

Named capture groups for JavaScript RegExps
https://tc39.github.io/proposal-regexp-named-groups/
222 stars 21 forks source link

Why throw SyntaxErrors in 8.1 Runtime Semantics: GetSubstitution #29

Closed msaboff closed 7 years ago

msaboff commented 7 years ago

In Table 1: Replacement Text Symbol Substitutions, SyntaxErrors are thrown in two situations. The first in the case that the $<groupName is not terminated by a closing '>'. The second is when the groupName is not a named group in the regular expression. This introduces the runtime first exceptions for the non-function flavor of String.prototype.replace.

It seems that both cases can be handled by defining alternative semantics. For the missing closing '>', treat the $<groupName as the replacement.

In the second case, use empty string as the replacement.

This would eliminate the need for programmers to wrap calls with try / catch for proper coding.

littledan commented 7 years ago

These ideas are reasonable and consistent with other ways the replace string works, but I'm not sure if we should switch to them.

The current semantics were chosen based on a general design principle that, for new language features, we should be stricter and throw exceptions where it's reasonable. This is malformed syntax. Another example of something similar is how Unicode RegExps don't allow invalid escape sequences to just be ignored (e.g., RegExp("\k") is like RegExp("k"), but RegExp("\k", "u") is a SyntaxError). However, maybe this case isn't appropriate for such treatment, since the replace string has no way to be "compiled".

FWIW these aren't the only way exceptions can be thrown from this code path. WIth RegExp subclassing or monkey-patching, a groups object may be provided which doesn't behave appropriately, and leads exceptions to be thrown (either directly or from the ToString on the result). I don't see a good way to ignore this; it would be pretty extreme to actually catch and suppress these errors.

Any more thoughts? @schuay @hashseed @mathiasbynens

littledan commented 7 years ago

We discussed part of your suggestion this bug thread: https://github.com/tc39/proposal-regexp-named-groups/issues/14

msaboff commented 7 years ago

It is clear from @littledan's reply and from bug thread #14, that there isn't an overwhelming drive for throwing a SyntaxError in the two cases.

First off, I would contend that SyntaxError is wrong for String.prototype.replace() and a malformed named replacement reference. In the case of new RegExp("("), throwing a SyntaxError makes sense as, but for String.replace() the proposed throwing of SyntaxError is not thrown during a strict parsing phase but during runtime processing of replacements to create the result. By using a Proxy or Subclassing, this processing is observable.

Furthermore, a SyntaxError or possibly the more suitable TypeError does not convey sufficient information for reasonable code to determine what exactly is wrong with the replacement string. Having defined semantics for both the missing closing '>' and a properly formed group name that isn't present in the RegExp, would make be easier for interested programmers to validate proper operation of String.prototype.replace().

Lastly, I can accept that new features should throw errors when extending existing methods where appropriate. In this case though, the proposal describes throws a SyntaxError during the runtime processing of a String utility method. All other existing String methods throw either Range or Type errors. In each of those cases, the error is detected and thrown early in the processing of the method and with a clear reason for the error. This proposed SyntaxError is thrown during processing and only catches the first of what could be several errors and provides very little direct indication as to the cause.

As an implementor who just finished the implementation of named capture groups in JavaScriptCore, my feedback is that it is troubling that part way through constructing the result we'd throw an error. In existing code that calls String.prototype.replace() via helper methods where the RegExp and replacement string are passed as arguments, such code would require rework to catch the new error. Defining semantics for malformed named references allows such code to operate in the presence of malformed group names in replacement strings.

The code currently checked into JavaScriptCore implements the changes I propose. That code will be updated as appropriate based on the resolution of this issue.

littledan commented 7 years ago

It is clear from @littledan's reply and from bug thread #14, that there isn't an overwhelming drive for throwing a SyntaxError in the two cases.

I agree; this is a sort of matter of taste.

Furthermore, a SyntaxError or possibly the more suitable TypeError

FWIW JSON.parse also throws SyntaxError... I'm not sure why TypeError makes all that much sense, since nothing is of the wrong type.

Furthermore, a SyntaxError or possibly the more suitable TypeError does not convey sufficient information for reasonable code to determine what exactly is wrong with the replacement string.

No standardized errors give you enough information to debug. It's the job of engines to give usable error messages. Do you think it would be especially impractical to do so in this case?

As an implementor who just finished the implementation of named capture groups in JavaScriptCore, my feedback is that it is troubling that part way through constructing the result we'd throw an error.

I think this is the strongest argument, and a very important constraint. @jgruber, did your implementation here of named groups use your RegExp subclassing fastpath? Would the proposed changes make it easier to do so?

schuay commented 7 years ago

Furthermore, a SyntaxError or possibly the more suitable TypeError

FWIW JSON.parse also throws SyntaxError... I'm not sure why TypeError makes all that much sense, since nothing is of the wrong type.

SyntaxError also seems reasonable to me. Why do you think TypeError would be more suitable?

Furthermore, a SyntaxError or possibly the more suitable TypeError does not convey sufficient information for reasonable code to determine what exactly is wrong with the replacement string.

No standardized errors give you enough information to debug. It's the job of engines to give usable error messages. Do you think it would be especially impractical to do so in this case?

+1, and an exception gives more information than just failing silently.

As an implementor who just finished the implementation of named capture groups in JavaScriptCore, my feedback is that it is troubling that part way through constructing the result we'd throw an error.

I think this is the strongest argument, and a very important constraint. @jgruber, did your implementation here of named groups use your RegExp subclassing fastpath? Would the proposed changes make it easier to do so?

No, this makes no difference to our implementation in V8.

  1. We currently bail out to runtime to handle anything but trivial replacement strings (anything containing a '$' character).
  2. Even if we were to fully implement GetSubstitution on the fast path, it shouldn't make a difference whether it throws or not.

I still don't have a strong opinion either way. If there are good arguments for silent failure vs. throwing, then I'm happy to go with that.

littledan commented 7 years ago

@msaboff Would throwing an exception in the middle of processing pose more of a challenge to throw an error partway through for issues in the replacement string, than the Get for what the actual named capture is?

Some possible alternative semantics, if we want all the exceptions thrown upfront:

That preserves the current semantics of more actively giving users errors, while moving all exceptions to the beginning of the replacement.

Thoughts?

schuay commented 7 years ago

-1 from me for upfront scanning & named group collection, seems like that would complicate things unnecessarily.

I'm not sure I understand what the issues are with throwing during result construction, @msaboff could you perhaps elaborate on that?

msaboff commented 7 years ago

Furthermore, a SyntaxError or possibly the more suitable TypeError

FWIW JSON.parse also throws SyntaxError... I'm not sure why TypeError makes all that much sense, since nothing is of the wrong type.

SyntaxError also seems reasonable to me. Why do you think TypeError would be more suitable?

It seems to me that TypeError is more consistent with existing uses elsewhere.

Furthermore, a SyntaxError or possibly the more suitable TypeError does not convey sufficient information for reasonable code to determine what exactly is wrong with the replacement string.

No standardized errors give you enough information to debug. It's the job of engines to give usable error messages. Do you think it would be especially impractical to do so in this case?

+1, and an exception gives more information than just failing silently.

I said nothing about failing silently. My suggestion is that we define semantics for all forms of $ expressions in the replacement string. This would effectively eliminate "failing".

As an implementor who just finished the implementation of named capture groups in JavaScriptCore, my feedback is that it is troubling that part way through constructing the result we'd throw an error.

I think this is the strongest argument, and a very important constraint. @jgruber, did your implementation here of named groups use your RegExp subclassing fastpath? Would the proposed changes make it easier to do so?

No, this makes no difference to our implementation in V8.

The same is true about JavaScriptCore. I don't not advocate a 2 pass solution to the problem. That would impact performance.

mathiasbynens commented 7 years ago

It seems to me that TypeError is more consistent with existing uses elsewhere.

Could you give an example?

msaboff commented 7 years ago

-1 from me for upfront scanning & named group collection, seems like that would complicate things unnecessarily.

I concur. I don't want upfront scanning for error and then replacement processing, i.e. two passes over the replacement string.

I'm not sure I understand what the issues are with throwing during result construction, @msaboff could you perhaps elaborate on that?

My main argument is that we are proposing a change in the operation of String.prototype.replace. Currently there are defined semantics for any replacement string input. With this change, we are introducing an error path that hitherto did not exist. It is a fundamental operational change in the API. That operational change is observable via proxies or subclassing. To take advantage of the new capabilities a programmer needs to not only update their RegExps and replacement strings, but they need to account for the new error path. By defining semantics for "malformed" group references, we introduce the functionality with much less change to the API.

Consider existing code with calls to String.prototype.replace where replacement strings contain "$<". Before this change, you'd get "$<" in the result. Now you get a SyntaxError. Although the occurrence of such replacement strings is arguably low, this is a breaking change.

littledan commented 7 years ago

My main argument is that we are proposing a change in the operation of String.prototype.replace. Currently there are defined semantics for any replacement string input. With this change, we are introducing an error path that hitherto did not exist.

Is it at all a problem for language design or JSC's implementation that the Get from the groups object may throw, observably in the middle of the replacement? This is also a new error path that didn't exist previously. I get the point about how there were previously no syntax errors, and now there are, but Unicode RegExps made an analogous change.

Consider existing code with calls to String.prototype.replace where replacement strings contain "$<". Before this change, you'd get "$<" in the result. Now you get a SyntaxError. Although the occurrence of such replacement strings is arguably low, this is a breaking change.

The breakage would only happen if you use it with a new RegExp with named groups. So, this can break existing libraries, but it can't break existing deployed websites. Even with your proposed fix, if an existing replacement string contained $<foo> and you had a group named foo, it would be replaced, giving incorrect output.

msaboff commented 7 years ago

It seems to me that TypeError is more consistent with existing uses elsewhere.

Could you give an example?

In 19.5.5.4 SyntaxError, it states:

Indicates that a parsing error has occurred.

In 19.5.5.5 TypeError is states:

TypeError is used to indicate an unsuccessful operation when none of the other NativeError objects are an appropriate indication of the failure cause

In almost all cases, SyntaxError is thrown for malformed Java Script or JSON source, malformed RegExp expression, failed parameter sanitization, and strict mode errors. In 8.1 Runtime Semantics: GetSubstitution the steps describe copy with replacement of $ prefixed strings.

This isn't a hill for me to die on though as I don't think we should be throwing any error.

msaboff commented 7 years ago

My main argument is that we are proposing a change in the operation of String.prototype.replace. Currently there are defined semantics for any replacement string input. With this change, we are introducing an error path that hitherto did not exist.

Is it at all a problem for language design or JSC's implementation that the Get from the groups object may throw, observably in the middle of the replacement? This is also a new error path that didn't exist previously. I get the point about how there were previously no syntax errors, and now there are, but Unicode RegExps made an analogous change.

No, throwing from Get isn't an issue. The difference is that exceptions along Get and other property access paths are not new and well understood. My issue is the direct introduction of throwing an error in an algorithm that didn't have throws for any of the existing cases that also handle malformed $xxx replacement constructs.

schuay commented 7 years ago

+cc @ajklein

mathiasbynens commented 7 years ago

Now that implementations are starting to ship, let’s get this issue resolved.

I’m okay with making the proposed change.

ajklein commented 7 years ago

@msaboff's arguments sound good to me, I'd be happy with this spec change (and as @mathiasbynens says, I'd like to get this resolved so we can move forward with un-flagging V8's implementation).

schuay commented 7 years ago

+1 from me as well.

hashseed commented 7 years ago

Also +1. Let's implement this and ship!

schuay commented 7 years ago

Landed in V8: https://chromium-review.googlesource.com/c/v8/v8/+/708654

littledan commented 7 years ago

OK, glad we could resolve this. I'll fix up the spec and test262 tests soon to reflect the V8/JSC consensus of not throwing errors in these cases.

littledan commented 7 years ago

OK, committed a patch to attempt to match the implementations; will update tests soon. cc @ljharb