Closed domenic closed 9 years ago
It makes the resulting string longer, other than that it's harmless.
This is what some programming languages (Python escapes non alphanumeric strings) do where others escape a strict set (like C#).
Might be worth mentioning this as a design alternative in the readme, with the pro that it's more future-proof.
Good idea, I'll add that when I'm in front of a computer :) (you're welcome to if you'd like of course).
Updated the README, I'll leave this open for a week to see if anyone has any further input on it.
Following the research https://github.com/benjamingr/RegExp.escape/blob/master/data/other_languages/discussions.md it appears that other languages that used to escape every character have either made exceptions (like Python) or changed it (like Perl). The discussion notes contain links to posts with reasons on why changes were made.
Python's new regex engine (under development) gives you a choice; either escape all non-alphanumerics, or only metacharacters (and NUL), see https://bitbucket.org/mrabarnett/mrab-regex/src/6193ea4246da272cf18a190c46aa116737067780/regex_3/Python/regex.py?at=default#cl-342
In your discussion you mentioned a problem with wide characters; you ran into the Python re
limitations with UCS-2 vs. UCS-4 builds (all Python versions up to 3.2 use one or the other based on a compile-time switch), the regular expression engine does not handle codepoints but codeunits, which in a UCS-2 build means 2 per non-BMP character. The escaping is correct for their respective builds.
I think we're good with not escaping every character. I want to focus on the discussion about big set vs readable set.
Is there any reason to only escape a specific subset? It's harmless to add slashes, right?