kokke / tiny-regex-c

Small portable regex in C
The Unlicense
1.24k stars 174 forks source link

Incorrect results for repeat operators #56

Open kkos opened 3 years ago

kkos commented 3 years ago

(1) Matching a? or a* with an empty string failed. (2) Using nested repeat operators failed. For example, a?+ with string "a".

kokke commented 3 years ago

Hi @kkos and thanks for the notice :+1:

Could you please provide some code so I can be certain what cases we are talking about.

Could you perhaps make a PR editing some of https://github.com/kokke/tiny-regex-c/blob/master/tests/test1.c to showcase exactly the cases you're talking about in 1) 2) ?

That would help me understand your issue :)

kkos commented 3 years ago

I don't even know if it supports nested repeat operators in the first place, so I'll refrain from writing a PR. But if you want to match the Python result, you would have something like this:

  { OK,  "a?",                        "",                 (char*) 0      },
  { OK,  "a?",                        "a",                (char*) 0      },
  { OK,  "a*",                        "",                 (char*) 0      },
  { OK,  "a??",                       "a",                (char*) 0      },
  { OK,  "a?*",                       "a",                (char*) 0      },
  { OK,  "a?+",                       "a",                (char*) 0      },
  { OK,  "a*?",                       "a",                (char*) 0      },
  { OK,  "a**",                       "a",                (char*) 1      },
  { OK,  "a*+",                       "a",                (char*) 1      },
  { OK,  "a+?",                       "a",                (char*) 0      },
  { OK,  "a+*",                       "a",                (char*) 1      },
  { OK,  "a++",                       "a",                (char*) 1      },

The Python code I tested:

#!/usr/bin/env python3                                                          
# -*- coding: utf-8 -*-                                                         

import re

# (1)                                                                           
print(re.match(r'a??', ""))
#=>  <re.Match object; span=(0, 0), match=''>                                   

print(re.match(r'a??', "a"))
#=>  <re.Match object; span=(0, 0), match=''>                                   

print(re.match(r'a*', ""))
#=>  <re.Match object; span=(0, 0), match=''>                                   

# (2)                                                                           
print(re.match(r'(?:a??)??', "a"))
#=> <re.Match object; span=(0, 0), match=''>                                    

print(re.match(r'(?:a??)*', "a"))
#=> <re.Match object; span=(0, 0), match=''>                                    

print(re.match(r'(?:a??)+', "a"))
#=> <re.Match object; span=(0, 0), match=''>                                    

print(re.match(r'(?:a*)??', "a"))
#=> <re.Match object; span=(0, 0), match=''>                                    

print(re.match(r'(?:a*)*', "a"))
#=> <re.Match object; span=(0, 1), match='a'>                                   

print(re.match(r'(?:a*)+', "a"))
#=> <re.Match object; span=(0, 1), match='a'>                                   

print(re.match(r'(?:a+)??', "a"))
#=> <re.Match object; span=(0, 0), match=''>                                    

print(re.match(r'(?:a+)*', "a"))
#=> <re.Match object; span=(0, 1), match='a'>                                   

print(re.match(r'(?:a+)+', "a"))
#=> <re.Match object; span=(0, 1), match='a'>                                   
kokke commented 3 years ago

Hi @kkos and thanks for helping me understand the issue đź‘Ť

I don’t think the nested operators are supported easily, but I will have a look and get back to you