princjef / regex-rust

A regular expression library implemented natively in Rust using Pike VM
MIT License
2 stars 0 forks source link

Format of replace string in replace function #10

Closed sarjun closed 10 years ago

sarjun commented 10 years ago

Jeff and I have been going back and forth about how to format the replace string given to the replace library function, and we wanted to get your opinion what we arrived at.

The replace string is the string that is substituted into the input string for each matching occurrence of the regular expression. The complication here is that the replace string can use groups that the regular expression matched. Jeff and I think that we could use backslashes to delimit these groups.

We initially wanted to use '$', but it becomes hard to differentiate when the user is using '$' to delimit a group and when they want the character '$'. Backslashes are normally escaped in strings anyway, so we could require that a literal backslash needs to be specified in the replace string as '\'. We could also say that a literal dollar sign has to be escaped, but that isn't immediately obvious since $ isn't normally escaped in a string.

As a side note, the Python implementation and C's re2 both use backslash to delimit groups in the replace string.

mjp2ff commented 10 years ago

If it's the same issue either way (having to escape a dollar sign or escape a slash), I'd go with the slash since like you said it's more standard/obvious.

ta3fh commented 10 years ago

I agree with Matt.

On Mon, Apr 14, 2014 at 2:22 AM, Matt Pearson-Beck <notifications@github.com

wrote:

If it's the same issue either way (having to escape a dollar sign or escape a slash), I'd go with the slash since like you said it's more standard/obvious.

— Reply to this email directly or view it on GitHubhttps://github.com/princjef/regex-rust/issues/10#issuecomment-40336202 .

Tanyathorn Arthornsombat Computer Science, B.S. Rodman Scholar Class of 2015 Trustee School of Engineering and Applied Science University of Virginia 2015 http://linkedin.com/in/tanyathornarthornsombathttp://www.linkedin.com/in/tanyathornarthornsombat

princjef commented 10 years ago

It would appear that we have decided to go with the backslash for specifying group names. In combination with Rust raw strings, this is a fairly clean syntax. Closing this issue.

rbk2kb commented 10 years ago

Make sure you don't clash with octal escapes; They are currently parsed as a \ followed by some number, including single digit ones

sarjun commented 10 years ago

An actual backslash in the replace string will be escaped ('\'), so '\1' for an octal escape won't clash with '\1' for the group. We should be good there