pemistahl / grex

A command-line tool and Rust library with Python bindings for generating regular expressions from user-provided test cases
https://pemistahl.github.io/grex-js/
Apache License 2.0
7.27k stars 173 forks source link

Very slow for large number of lines #66

Closed bbugh closed 2 years ago

bbugh commented 2 years ago

Hi! 👋🏻 I thought this may be useful. I came across grex looking at Rust tools and thought I'd try it on an old problem. I'm not planning to use it for anything currently, but I thought this report might be useful.

A file with 3622 lines takes frak (clojure) ~100ms to produce a regex.

Grex takes around 50 seconds on the same list:

grex --file names.txt  49.84s user 14.48s system 89% cpu 1:12.25 total

Here's names.txt:

https://gist.github.com/bbugh/4bf9df86931a94f89e61b3b3896d4e62

pemistahl commented 2 years ago

Hello Brian, thank you for letting me know about this. I've been already aware of the fact that the DFA minimization algorithm is slow for a large amount of input strings. I will try to improve the algorithm's performance or at least implement a switch that allows to disable DFA minimization. grex won't produce the shortest regex then but it will perform much faster.