nitely / nim-regex

Pure Nim regex engine. Guarantees linear time matching
https://nitely.github.io/nim-regex/
MIT License
227 stars 20 forks source link

Error with regular expression to find words in quotes #85

Closed not-lum closed 3 years ago

not-lum commented 3 years ago

Code:

import regex

const regexp = "(?<=[\"\'][^\"\'])+"
echo "\"in quotes\"".replace(re(regexp), "$1") 

Expected output:

"in quotes"

Current output:

stack trace: (most recent call last)
C:\Users\Admin\.nimble\pkgs\regex-0.17.0\regex.nim(293, 11) re
C:\Users\Admin\.nimble\pkgs\regex-0.17.0\regex\compiler.nim(23, 9) reCt
C:\Users\Admin\.nimble\pkgs\regex-0.17.0\regex\compiler.nim(10, 5) reImpl
C:\Users\Admin\.nimble\pkgs\regex-0.17.0\regex\parser.nim(744, 18) parse
C:\Users\Admin\.nimble\pkgs\regex-0.17.0\regex\parser.nim(662, 7) subParse
C:\Users\Admin\.nimble\pkgs\regex-0.17.0\regex\parser.nim(641, 5) parseGroupTag
C:\Users\Admin\.nimble\pkgs\regex-0.17.0\regex\parser.nim(48, 5) check
C:\Users\Admin\Desktop\projects\Nim\stuff\regex_bug.nim(4, 32) template/generic instantiation of `re` from here
C:\Users\Admin\.nimble\pkgs\regex-0.17.0\regex\parser.nim(48, 5) Error: unhandled exception: Invalid lookaround, expected closing symbol. Beware lookaround is currently limited to match one single character
(?<=["'][^"'])+
^ [RegexError]

Additional information:

Nim Compiler Version 1.4.0 [Windows: amd64]
Compiled at 2020-10-16
Copyright (c) 2006-2020 by Andreas Rumpf

active boot switches: -d:release
nitely commented 3 years ago

I think the error is clear Invalid lookaround, expected closing symbol. Beware lookaround is currently limited to match one single character, and maybe that limitation will be lifted someday. That said, the code snippet doesn't seem to work the way you expect it in re (PCRE) either...

nitely commented 3 years ago

btw, here are some regex that may work in your case:

# similar to yours
"[\"'][^\"']+[\"']"

# a probably better one, since it won't match "foo'
"\"[^\"]+\"|'[^']+'"

# using raw string, otherwise \w will need double escaping \\w
# triple quoting is needed because the string contains quotes
r"""["'][^"']+["']"""

# passing the regex as raw string to re
const reg = re"""["'][^"']+["']"""
echo "\"in quotes\"".replace(reg, "$1") 
or
echo "\"in quotes\"".replace(re"""["'][^"']+["']""", "$1") 
nitely commented 3 years ago

I'm working on full lookaround support, see #94