telekons / one-more-re-nightmare

A fast regular expression compiler in Common Lisp
https://applied-langua.ge/projects/one-more-re-nightmare/
BSD 2-Clause "Simplified" License
138 stars 9 forks source link

Matching problems with `[0-9][0-9]` #18

Closed q3cpma closed 2 years ago

q3cpma commented 2 years ago
CL-USER> (one-more-re-nightmare:all-matches "[0-1][0-9][0-9]" "192")
NIL
CL-USER> (one-more-re-nightmare:all-matches "[0-9][0-9]" "192")
NIL
CL-USER> (one-more-re-nightmare:all-matches "[0-9][0-9]" "1921")
(#(2 4))

Using latest Quicklisp (20220331) with SBCL 2.2.4.

no-defun-allowed commented 2 years ago

The issue is that the Quicklisp release of OMRN confuses half-open and open sets. The parser constructs a set thinking that (symbol-range A B) constructs a set including A and B, whereas it constructs a set including A but not B. This is fixed in newer OMRN, but I am waiting for the next Quicklisp release.

For the meanwhile, this can be fixed by replacing lines 102-106 of Code/Interface/syntax.lisp with

(esrap:defrule character-range-range
    (and character "-" character)
  (:destructure (low dash high)
    (declare (ignore dash))
    (symbol-range (char-code low) (1+ (char-code high)))))

which will produce the correct set.