mathiasbynens / todo

1 stars 0 forks source link

JavaScript library: create regular expression based on array of code points/symbols #6

Closed mathiasbynens closed 11 years ago

mathiasbynens commented 11 years ago

Like the internal createRange etc. methods used in http://git.io/unicode, but in JS.

mathiasbynens commented 11 years ago

I have a working JavaScript version of the code now, but before I push it to GitHub, I need:

  1. a good name for the project
  2. a decent API design for it
  3. feedback on the return format (should it return a string that can be used as part of a regular expression literal, or something else?)

I was thinking of regex-generator, but maybe there’s a more clever name for this project?

As for the API, I was thinking of exposing a global regexGenerator object with a fromCodePoints method on it. Example (using the Node.js syntax, but you get the idea):

> var generator = require('regex-generator');
> generator.fromCodePoints([0x0, 0x1, 0x2, 0x3, 0x1D306, 0x1D307, 0x1D308, 0x1D30A]);
'[\\x00-\\x03]|\\uD834[\\uDF06-\\uDF08\\uDF0A]'

As you can see, the result is a string that can be saved to a file (as part of a build process, perhaps) and used in a JavaScript regular expression literal. (Escape sequences are used to prevent data loss of non-printable characters or mojibake of non-ASCII symbols.) Would a different return format be more useful? Or should the return format be configurable?

Later on, other methods like fromSymbols (which would take an array of single-symbol strings instead of code points) could be added. Perhaps methods to easily generate regular expressions based on the start and end value of a range of code points / symbols could be useful too (e.g. fromCodePointRange(0x1D306, 0x1D356), fromSymbolRange('\uD834\uDF06', '\uD834\uDF56').

Any feedback on this is welcome! Would a different API / return format be more useful? /cc @slevithan

mathiasbynens commented 11 years ago

http://mths.be/regenerate