Closed jmishra01 closed 2 months ago
Lark is perfectly capable of parsing Arabic text. Your grammar just doesn't match the given text. Depending on what exactly you meant to do, you need to change your definition of string
. Most notably, the second term you added | /[u"\u0001-\uFFFF"]/ )
matches (almost) any single character, which probably isn't what you want.
Your definition of string doesn't include repetition... it can only match a single character.
There are online regexp IDEs, that can help you. You can also test regexps directly using Python's re
module.
Thanks, @MegaIng and @erezsh, for the quick reply.
The problem is resolved using the below grammar.
grammar = """
start: string
string: /'([^'\\]*(?:\\.[^'\\]*)*)'/
%import common (CNAME, WS, SIGNED_NUMBER, INT)
%import common.NEWLINE -> _NL
%import common.WS_INLINE
%ignore WS_INLINE
%ignore WS
"""
Lark fails to parse Arabic text. Kindly check the sample Python code below to re-generate the issue