Open mristin opened 10 months ago
This might be related to #100 -- though the error message is a bit confusing here (index 1 is (
, I suppose).
When I undo the special characters (to circumvent #100), I still get an exception:
greenery.parse(
'^([!#$%&\'*+\\-.^_`|~0-9a-zA-Z])+/([!#$%&\'*+\\-.^_`|~0-9a-zA-Z])+([ \t]*;[ \t]*([!#$%&\'*+\\-.^_`|~0-9a-zA-Z])+=(([!#$%&\'*+\\-.^_`|~0-9a-zA-Z])+|"(([\t !#-\\[\\]-~]|[\x80-ÿ])|\\\\([\t !-~]|[\x80-ÿ]))*"))*$'
)
The exception:
greenery.parse.NoMatch: Could not parse '^([!#$%&\'*+\\-.^_`|~0-9a-zA-Z])+/([!#$%&\'*+\\-.^_`|~0-9a-zA-Z])+([ \t]*;[ \t]*([!#$%&\'*+\\-.^_`|~0-9a-zA-Z])+=(([!#$%&\'*+\\-.^_`|~0-9a-zA-Z])+|"(([\t !#-\\[\\]-~]|[\x80-ÿ])|\\\\([\t !-~]|[\x80-ÿ]))*"))*$' beyond index 1
(Mind that characters \x80
are not escaped in the pattern.)
The re
works ok:
import re
re.compile(
'^([!#$%&\'*+\\-.^_`|~0-9a-zA-Z])+/([!#$%&\'*+\\-.^_`|~0-9a-zA-Z])+([ \t]*;[ \t]*([!#$%&\'*+\\-.^_`|~0-9a-zA-Z])+=(([!#$%&\'*+\\-.^_`|~0-9a-zA-Z])+|"(([\t !#-\\[\\]-~]|[\x80-ÿ])|\\\\([\t !-~]|[\x80-ÿ]))*"))*$'
)
I narrowed this down to:
greenery.parse("[!#$%&'*+\\-.^]")
which fails. Escaping ^
fixes the issue:
greenery.parse("[!#$%&'*+\\-.\\^]")
This is probably by design, if I understood the readme correctly?
Correct, the parser is intentionally very simple and if you want a literal caret in a character class you need to backslash-escape it. There are lot of sophisticated bits of syntax for character classes, like [^-]
and [^^]
and []]
, which are technically unambiguous but in practice (1) I consider confusing to read and (2) are a total headache to implement when parsing. I will consider enhancing the parser to handle this but for now the workaround is backslashes.
Note that my project interegular
tries quite a bit harder to match stdlib's re
syntax and I am currently reworking it to use greenery.fsm
in the background, so that might be a better fit for your usecase.
(I am not quite sure what part of the regular expression is problematic for greenery, so please change the title accordingly.)
I can compile the following pattern with
re
:... but greenery fails:
with the exception: