tani / regex

regular expression engine for CommonLisp
GNU Lesser General Public License v2.1
12 stars 4 forks source link

regex hangs on match start #1

Open Inc0n opened 5 years ago

Inc0n commented 5 years ago

trying regex string "^+" or "^+" to match '+' at start would hang

CL-USER> (re:scan "^+" "abcd")
(#<CLOSURE (LAMBDA (REGEX.CORE::DATA REGEX.CORE::REG REGEX.CORE::CONT)
             :IN
             REGEX.CORE:RE/REPEAT) {100406F20B}>
 #<FUNCTION (LAMBDA (REGEX.CORE::DATA REGEX.CORE::REG REGEX.CORE::CONT)
              :IN
              REGEX.CORE:RE/OK) {52DE956B}>)

but this would work

CL-USER> (re:scan "^[+]" "abcd")
             ....
             :IN
             REGEX.CORE:RE/BRACKETS) {100250091B}>
 #<FUNCTION (LAMBDA (REGEX.CORE::DATA REGEX.CORE::REG REGEX.CORE::CONT)
              :IN
              REGEX.CORE:RE/OK) {52DE814B}>)
NIL
tani commented 5 years ago

Thank you for the issue. However, ^+ is an invalid expression of POSIX Extended Reuglar Exression. You could check it with grep like this.

$ grep -E '^+' <(echo '+')
grep: repetition-operator operand invalid

We follow the syntax of grep with -E option that is the Extended Reuglar Exression. Cheers.

Inc0n commented 5 years ago

I think the mistake is that the regex should be “^+”, should some type of check be implemented into the syntax check? For this type of issue.

On 14 Mar 2019, at 09:09, TANIGUCHI Masaya notifications@github.com wrote:

Thank you for the issue. However, ^+ is an invalid expression of POSIX Extended Reuglar Exression. You could check it with grep like this.

$ grep -E '^+' <(echo '+') grep: repetition-operator operand invalid We follow the syntax of grep with -E option that is the Extended Reuglar Exression. Cheers.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

tani commented 5 years ago

Yes, I agree that we should implemented into the syntax check, then the better way is reimplementing the parser with full-futured parser generators such as esrap because we could expect that these parser generators can inform errors and suggestions to correct the syntax.

Of couse we could add the validator for these errors with this engine but the syntax is not in the regular grammar, thus the validator is incomplete to check all errors.