noprompt / frak

Transform collections of strings into regular expressions.
1.13k stars 39 forks source link

Dictionary Regex #1

Closed jbrooksuk closed 11 years ago

jbrooksuk commented 11 years ago

Firstly, an epic tool, very nice!

What engine are you using?

I just want to confirm whether the /usr/share/dict/words generated expression works? I've tried it twice but it's never matched anything.

mathiasbynens commented 11 years ago

When using the generated regex, don’t forget to prepend ^ and append $.

Trying out the regular expression in JavaScript:

re.test('book'); // true
re.test('xxxxxxx'); // false

Seems to work fine.

noprompt commented 11 years ago

@mathiasbynens At the moment, frak doesn't generate patterns that require an exact match. However, since you've brought this up I think it's a good idea to have an option for creating patterns that do. It's been a while since I've messed with regular expressions in JavaScript and find it interesting 'xxxxxxx' is considered a match. With Clojure this isn't the case.

user=> (re-matches word-re "xxxxxx") ;; Exact
nil
user=> (re-find word-re "xxxxxx") ;; Loose
"x"

My guess is re.test('xxxxxx') in JavaScript is analogous to re-find in Clojure and re-matches has no analogue in JavaScript.

At any rate there is a good chance this tool will be compiled down to JavaScript so I'll definitely fix this.

Nice catch!

noprompt commented 11 years ago

@jbrooksuk frak generates patterns that should be compatible with most regular expression engines available in mainstream languages like Ruby, Python, JavaScript, and the like. Note, however, I'm not making any guarantees about that since this is a Clojure project. The problem I specifically built this tool for does require some additional manipulation of the rendered pattern to work with Vim as you can see here. You might be in a situation where you'll need to do the same.

So, I should probably turn the question around and ask: how are you using the pattern?

jbrooksuk commented 11 years ago

I was trying it out in AutoIt v3 - it uses the PCRE library so should be compatible?

noprompt commented 11 years ago

@jbrooksuk Not exactly. The PCRE man page says in the LIMITATIONS section:

The maximum length of a compiled pattern is 65539 (sic) bytes if PCRE is compiled with the default internal linkage size of 2. If you want to process regular expressions that are truly enormous, you can compile PCRE with an internal linkage size of 3 or 4 (see the README file in the source distribution and the pcrebuild documentation for details). In these cases the limit is substantially larger. However, the speed of execution is slower.

TL;DR the pattern is probably too large.

If you're truly interested in experimenting with this pattern, try it from JavaScript, Ruby, or Python with @mathiasbynens suggestion. Or, even better, try it with Clojure.

noprompt commented 11 years ago

@jbrooksuk I never answered your second question. Yes. It works.