laurikari / tre

The approximate regex matching library and agrep command line tool.
Other
797 stars 133 forks source link

Regex (^|\b) is not handled correctly, but (\b|^) is #78

Open juliangilbey opened 3 years ago

juliangilbey commented 3 years ago

The following small example shows that this regex is not handled correctly.

#include <stdio.h>
#include <tre/tre.h>

int main() {
  regex_t preg;
  int ret;

  tre_regcomp(&preg, "\\ba", REG_EXTENDED);
  ret = tre_regexec(&preg, "this is a word", 0, NULL, 0);
  printf("return value for \\ba is %d\n", ret);

  tre_regcomp(&preg, "^a", REG_EXTENDED);
  ret = tre_regexec(&preg, "this is a word", 0, NULL, 0);
  printf("return value for ^a is %d\n", ret);

  tre_regcomp(&preg, "(^|\\b)a", REG_EXTENDED);
  ret = tre_regexec(&preg, "this is a word", 0, NULL, 0);
  printf("return value for (^|\\b)a is %d\n", ret);

  tre_regcomp(&preg, "(\\b|^)a", REG_EXTENDED);
  ret = tre_regexec(&preg, "this is a word", 0, NULL, 0);
  printf("return value for (\\b|^)a is %d\n", ret);
}
euler:/tmp $ gcc -o testtre testtre.c -ltre
euler:/tmp $ ./testtre 
return value for \ba is 0
return value for ^a is 1
return value for (^|\b)a is 1
return value for (\b|^)a is 0

I'm using libtre version 0.8.0-6 on Debian, which is based on the 0.8.0 release.

Best wishes,

Julian