intel / hyperscan

High-performance regular expression matching library
https://www.hyperscan.io
Other
4.71k stars 705 forks source link

Numbered repeat doesn't work if the lower number is omitted #419

Closed dchenz closed 6 months ago

dchenz commented 7 months ago

Steps to reproduce:

import re
re.search( r"ba{,3}", "baaa", flags=re.DOTALL | re.MULTILINE )  # returns 'baaa' as expected

# Compile the same pattern and it won't return any matches on the above search string.
db = hyperscan.Database( mode=hyperscan.HS_MODE_STREAM )
db.compile(
   expressions=[ b"ba{,3}" ],
   ids=[ 1234 ],
   flags=[ hyperscan.HS_FLAG_MULTILINE | hyperscan.HS_FLAG_DOTALL ],
   elements=1,
)

Hyperscan version: 5.6.1 Python hyperscan version: 0.4.0 Python version: 3.9.17

dchenz commented 6 months ago

Found Bounded repeat qualifiers such as {n}, {m,n}, {n,} are supported with limitations in the docs, so I think it's expected unsupported behavior.